Closed Bug 31770 Opened 24 years ago Closed 23 years ago

Find on Page [and in view source] extremely slow

Categories

(SeaMonkey :: UI Design, defect, P1)

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9.5

People

(Reporter: spam, Assigned: mozeditor)

References

Details

(Keywords: perf, topperf, Whiteboard: [PDT-] ETA: fixed on trunk; waiting for approval for 094. [perf][nav+perf])

Attachments

(19 files)

257.35 KB, text/html
Details
89.16 KB, text/html
Details
2.40 KB, text/plain
Details
9.97 KB, patch
Details | Diff | Splinter Review
337.75 KB, text/html
Details
17.38 KB, patch
Details | Diff | Splinter Review
16.79 KB, patch
Details | Diff | Splinter Review
18.45 KB, patch
Details | Diff | Splinter Review
28.73 KB, patch
Details | Diff | Splinter Review
29.88 KB, patch
Details | Diff | Splinter Review
29.88 KB, patch
Details | Diff | Splinter Review
31.45 KB, patch
Details | Diff | Splinter Review
33.41 KB, patch
Details | Diff | Splinter Review
34.20 KB, patch
Details | Diff | Splinter Review
35.24 KB, patch
Details | Diff | Splinter Review
13.51 KB, patch
Details | Diff | Splinter Review
18.65 KB, patch
Details | Diff | Splinter Review
23.22 KB, patch
jesup
: review+
Details | Diff | Splinter Review
23.27 KB, patch
kinmoz
: review+
kinmoz
: superreview+
Details | Diff | Splinter Review
To avoid filing duplicates i searched bugzilla for new, reopened, verified,
fixed and unverified bugs. Got around 3600 hits. (Resulting page loaded in 5
minutes) Now: Searching with "find on page" got slower and slower - to find the
next occurance of a word mentioned twice in one sentence took around 15 seconds
when i was half way down the page. Almost like it searched the whole page from
start all over again each time the "find" button was clicked. In effect it's so
slow it's useless. I can read the page myself, faster than the searchtool can.
Build ID: 2000031318
search dialog bug
Assignee: matt → law
reporter - what sort of system (processor, etc) are you on?  When you search,
how bug is the cpu usage?
Intel P120, 96 MB RAM, RedHat6, somewhat upgraded. Gnome/sawmill. No problems.
I did a comparision, a little "manual benchmark". 
This query was performed first in Netscape 4.7, then in Mozilla Build ID
2000031418:
http://bugzilla.mozilla.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=VERIFIED&rep_platform=PC&op_sys=Linux&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&short_desc=&short_desc_type=substring&long_desc=&long_desc_type=substring&bug_file_loc=&bug_file_loc_type=substring&status_whiteboard=&status_whiteboard_type=substring&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=Reuse+same+sort+as+last+time


(that got long heh...)
Well: 
netscape found all 26 occurances of the word "ftp" (no quotes) in that page in
45 seconds.
Mozilla found them in 6 minutes, 35 seconds.

The test was performed by clicking on "search" again as soon as the word was
found. Netscape 4.7 was almost 9 times faster, in other words.
What is very striking is that Mozilla gets slower and slower as it searhes
further down the page. In the end it takes around 15 seconds to find the second
occurance of ftp in a sentence looking like this:  ftp://ftp.... (etc)
And yes - CPU load was high while Mozilla searched: It uses what it can -
figures like 87% and 95% showed up with a "top" when i tested it afterwards. (No
"top" was running while the comparative search was done, and the apps were
running "alone" - under the same conditions.) The only "difference" was that
this was done manually, and since Netscape found the word "ftp" very quick at
times, i may have been a little slow hitting the find-button again. In case it
did any difference it's to Mozilla's discredit, unfortunately.
Confirming bug. I tried it on WinNT with a 450Mhz P2 and the lag is still noticeable. Very telling is the case where ftp occurs again 
only a few characters away from where the last occurence was. 4.x finds this instantaneously. Seamonkey takes just as long as 
the others. I get the impression Seamonkey searches the whole page up to the point where it last found an occurence each time 
through. Whereas 4.x just picks up searching wherever it left off (the last occurence).

I'm also changing the component and QA.
Status: UNCONFIRMED → NEW
Component: Search → XPApps
Ever confirmed: true
QA Contact: claudius → sairuh
Summary: Searh/Find on Page extremely slow → Find on Page extremely slow
h'm, i didn't really see much of a lag btwn hitting the Find button (searched
for the string 'ftp' in the link above from dark@c2i.net):

linux [2000.03.15.06-nb1b, mozilla] -- 2 seconds (okay, that's slow)
mac [2000.03.15.05-nb1b, mozilla] -- less than 1 second
winNT [2000.03.15.06-nb1b, commercial] -- less than 1 second

however, i did notice when i first went to that page, and initially brought up
the Find On Page dialog, that it would take 2-6 seconds for the dialog to
appear. did anyone else see that?
Yes :) But that's the least of the problems..
sairuh: A second thought.. Did you go through the whole procedure, finding *all*
(26?) occurances of the word ftp in that page?
This is probably a performance issues with nsITextServices, which kin wrote. 
cc:ing kin. FYI, Spellcheck in editor uses the same service, so may also show 
performance problems on long documents.
ah...i think i see what you mean: i only searched a few times, ie, the first 3
or so instances of "ftp".

so, i just tried searching through the whole page, and what i notice is that
searches for the first half of the page take only ~1sec...but once i'm searching
through the latter half of the page, it takes more like 2-3sec btwn each search
(longer on linux than on mac or winNT). but not extremely slow from my
perspective...
I oughta look you up in the Silmarillon.. Regardless of horsepower the factors
here speak for themselves, I think. 6.5 min. search versus 45 sec. This can be
done much better.
Giving this to kin.  The problem has to be in the document text services.
Assignee: law → kin
Keywords: perf
Target Milestone: --- → M16
I've noticed a considerable slowdown of find as well.  It would seem to be

related to an insane amount of memory usage.  After the first find, virtual

memory usage jumps up about 60 or 70 MB, and then slowly creeps up with each

additional find.



I don't think it's leaks, per se, as all of the memory is immediately freed when

you close the find dialog.



This is with Linux on a Celeron 500, 128MB RAM.

*** Bug 36005 has been marked as a duplicate of this bug. ***
Accepting bug.
Status: NEW → ASSIGNED
Target Milestone: M16 → M17
moving to m17
*** Bug 45782 has been marked as a duplicate of this bug. ***
Keywords: nsbeta3
Component: XP Apps → XP Apps: GUI Features
setting to m18, nsbeta3
Target Milestone: M17 → M18
I did a jprof profile (to be attached) of find-in-page in nsBlockFrame.cpp (for
a string that wasn't in the file).  The basic problem here is that
nsContentIterator::NextNode (for post-order traversal) calls
|parent->IndexOf(cN, indx)|, which is O(N) on the length of the array.  On a
huge pre element, this is really really bad, because when you iterate through
the whole list, the iteration is O(N^2) when it should be O(N).

nsContentIterator needs to store the index of the current node.  If there's a
possibility that the list changes during iteration, then it should double check
that the index is still valid, and *if not*, do the O(N) search.  It's probably
worth going through nsContentIterator and seeing if other things need cleaning
up too...
Note that since that's a realtime profile (rather than time spent by the app),
g_main_poll shows up.  It should be ignored.
Cc jfrancis@netscape.com since he is the author of the content iterator.
I have no idea how to read that profile.  I'll have to reprofile this with a tool 
whose output I can read.  caching the index will make the iterator faster, but 
does that explain a 15 second delay between finding two occurances of a string in 
the same line?  I smell another problem...
To see how to read the profile:
http://lxr.mozilla.org/mozilla/source/tools/jprof/README.html

What the profile is showing is that 73% of the time in Find in Page is spent in
the call to IndexOf() within nsContentIterator::NextNode.  That's clearly a
problem.
Right.  But while indexof is slow, that doesnt mean thats the problem.  The real 
problem may be that we are iterating unneccessarily, etc.  It's hard to see how 
two occurances of the string within the same node should need to call indexof a 
bunch of times.
moving to m19
Target Milestone: M18 → M19
for instance, nsTextServicesDocument::GetCurrentTextBlock() is spending all of 
it's time in the iterator.  That smells bad.  Do you want to look at that Kin?  

On another front, I'm wondering if I can get away with the caching David 
suggests.  I can if all the callers are keeping their contract not to use the 
iterator while chaning the content they are iterating.  I think right now you are 
only hosed if you delete content, but with that change I think you might be hosed 
when adding content as well.  

Maybe I should just try it and see what breaks.  :-P
jfrancis:  If you're worried about users changing the content model while
iterating, you might want to try one of:

a) cache *both* the index and the current node, and when you want the next node,
call IndexOf (which is O(1)) on the cached index, and if it's *not* the current
node, do the O(N) search

b) in DEBUG mode only, cache the current node, and always cache the index, and
in DEBUG mode only do the O(1) check and Assert if it fails.  This would prevent
callers from triggering lots of O(N) searches.
dbaron: I don't understand "call IndexOf (which is O(1)) on the cached index".  
IndexOf takes nodes, not indexes.  And it's never O(1) unless you are asking for 
the index of the first child.  I think you mean something else, but I'm not sure 
what...
Sorry, I meant "ChildAt" instead of "IndexOf" in part (a) of my last comment.
Keywords: nsbeta3
beppe:  Please do not remove nsbeta3 nominations made by others.
Keywords: nsbeta3
well, perf bugs are in m19 because we are addressing them after the correctness 
and polish bugs, hence I removed the nsbeta3 keyword, since you want the keyword 
in -- marking nsbeta3-

NOTE: will readdress after correctness and polish bugs are addressed
Whiteboard: [nsbeta3-]
This profile still shows that most of the time is spent in the iterator.  It
looks like we probably are doing unnecessary work on every successive find.

In the original profile, of searching a large document for a string it did not
contain, most of the time was spent in nsContentIterator::Next, which was called
(roughly equally) by:

nsTextServicesDocument::FirstTextNodeInNextBlock(nsIContentIterator *)
nsTextServicesDocument::CreateOffsetTable(nsString *)

In the new profile finding text later on the same line, about 2/3 of the
iteration time is spent in nsContentIterator::Next, all of which was within
nsTextServicesDocument::CreateOffsetTable(nsString *) and about 1/3 of the time
was spent in nsContentIterator::Prev, all of which was within
nsTextServicesDocument::::FirstTextNodeInCurrentBlock(nsIContentIterator *) .

This makes me think that probably the usage of the iterator could be improved
too...
find is rendered useless the further down the file, it basically comes to a 
halt. For messages or composed pages, if a find is invoked and the search word 
is repeated through the file, each find will cause the next portion to be 
slower. In one test with a page that was approx. 700 words, with the word "help" 
repeated within the text 50 times, resulted in a wait of 15-17 seconds between 
words. That is a major problem.
Severity: normal → major
Keywords: rtm
Priority: P3 → P2
Whiteboard: [nsbeta3-] → [nsbeta3-][rtm+]
Find needs to be entirely rearchitected anyway.
Priority: P2 → P1
Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm+][p:1]
Not setting rtm +/- until i understand the scope of the fix, if a viable 'patch' 
that is low risk can be written to resolve this, then it will be considered. if 
a complete rearchitecture needs to be done, then this will need to wait until 
post rtm.
Whiteboard: [nsbeta3-][rtm+][p:1] → [nsbeta3-][p:1]
just talked with kin, he believes it is a low risk fix for a very visible 
problem with a highly used feature.

Kin, please include the required information per the rtm checkin rules
Whiteboard: [nsbeta3-][p:1] → [nsbeta3-][p:1][rtm+ NEED INFO]
Simon and I did some looking at this. It turns out the find dialog is stateless, 
so every time you press find, the find code uses the 
TextServicesDocument::PrevBlock() method to count how many blocks from the 
beginning of the doc the current block is, just in case it has to wrap around. 

Going backwards in the document is very expensive because the content iterator 
can't find the previous content node without calling IndexOf. IndexOf can get 
really expensive especially in big flat documents.

I'll need to provide some method that can give the find code the current block 
index without having to resort to going backwards through the document.
i  believe the search engine never "wraps around"?
If you mean it at some point would start searching from the beginning again.
That never happens. To test: search for a common word, all occurances in a page.
Then change the word it searches for and search again. It will only beep, even
if the word is mentioned.
When it comes to "end of search" of the first search, it never "leaves" that
last spot it got to, but subsequent searches are only performed in context
*after* it, and not "wrapping" to search from start of page again.
It should wrap around, and certainly worked. Did you check the 'wrap' checkbox in 
the dialog?
Oops..silly me. I assumed it would autowrap. (4.7* almost does that, prompts
first though)
removing + per pdt sw rules
Whiteboard: [nsbeta3-][p:1][rtm+ NEED INFO] → [nsbeta3-][p:1][rtm NEED INFO]
Moving milestone to Future for now.
Target Milestone: M19 → Future
removed need info and set to -
Whiteboard: [nsbeta3-][p:1][rtm NEED INFO] → [nsbeta3-][p:1][rtm-]
*** Bug 61965 has been marked as a duplicate of this bug. ***
adding myself to CC, hope I can help out.
Keywords: nsbeta3, rtmnsbeta1
OS: Linux → All
Hardware: PC → All
Whiteboard: [nsbeta3-][p:1][rtm-]
*spam* m0.9
Keywords: mozilla0.9
Target Milestone: Future → mozilla0.9
FYI, when I added the replace feature, I made it so that the find code doesn't 
iterate backwards to figure out where in the doc it is, if the wrap around 
checkbox is unchecked. This makes searching much quicker.

I will look into fixing the wrap around case.
Goodies.
Because a major problem with the search feature is that i have to CLICK in a
page before search even knows where to search. If i don't give a web-page focus
with a click before a search, search will find nothing, regardless of whether
wrap is checked or not.

The problem  with this is that search after a "click in a page" will "sense"
that click as an insertion-point of sorts, and starts searching FROM there and
onwards - missing all previous occurances of a word.
The focus problem you describe is a front end (nsFindComponent) problem, which 
belongs to law@netscape.com. Perhaps you should file a bug/rfe on that?

Note there are some intersting issues related to just doing a find automatically 
in the browser content area ... for example who do you give it to if the content 
area contains multiple frame sets?
How about "if focus not indicated, search them all" ?
And then let focus be indicated by a left-click in a frame, or by which frame
user bring up context-menu to "search in frame".
this is a performance-oriented bugs - please file a seperate bug for focus issues
I'll try to get to this next milestone. (Mozilla0.9.1)
Target Milestone: mozilla0.9 → mozilla0.9.1
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Whiteboard: [perf]
moving to 1.0 for perf work
Target Milestone: mozilla0.9.2 → mozilla1.0
(I thought I mentioned this earlier)

As dbaron mentioned, nsContentIterator needs to be rewritten, I suspect. This is
really obvious on stuff like lxr, because we have a very wide and flat content
tree. I tried a one element cache, but that didn't help, because the links are
two levels deep (the <a> + the text node).

We need to keep a stack of {offset, ptr} pairs, and revalidate (in O(1), from
the nsVoidArray), to deal with dynamic content. Does that sound about right?
we dont have to do anything for dynamic content, because the iterator is 
documented not to be safe to use in the face of dom changes while iterating.

If we need to speed up the iterator, it can be done similar to the way the 
subtree iterator was improved (and similar to the way that copy a range was done 
in nsDocumentEncoder).  

But it's the statelessness of Find that is making it slow.  It could be quick to 
find adjacent matches with the current iterator.
*** Bug 84652 has been marked as a duplicate of this bug. ***
Whiteboard: [perf] → [perf][nav+perf]
I made a test...

Trying to look for "Rosen" in the LXR view of nsCSSFrameConstructor.cpp, I
left the machine running. The string is on the 21th line or so. What a hard
thing to find.... It took 17 seconds to find it !

And I think I know why (suspense :-)

The LXR view of the source is mostly made of a ***13476*** (2001-06-06) lines
PRE element.

|nsFindAndReplace::DoFind | calls |nsTextServicesDocument::GetCurrentTextBlock|,
itself calling |nsTextServicesDocument::CreateOffsetTable|.

And that last method just seem to build a string made of the agregation of ALL
the text nodes in the PRE element !!!!

We should only agregate the text nodes until the total length is for the
first time  >= length of the string we look for. If the string we look for
is not found in that agregation, remove the first text node in this agregation
and add the next text node in traversal order. Of course, generated nodes are
excluded and the algo should reset the agregation on block-level elements
and replaced elements limits.

That should reduce the time spent in finding "Rosen" in
nsCSSFrameConstructor.cpp from 17 seconds to something quite neglectable.

excellent debugging 
We *should* search in generated content, since the user doesn't know whether it 
is real content or not.
Blocks: 93204
This is a VERY user-visible performance problem (and not a minor problem; it
makes Find close to useless and almost always aggravating); adding topperf
(IMHO).  Nominating for mozilla 0.9.4.  There's lots of good analysis here. 
Someone needs to make this get fixed.
Ok, here's some analysis:

Obviously IndexOf is the main culprit here.  The question is why (since IndexOf
is pretty optimized).

The answer is that nsContentIterator::Next() has some optimizations to cache
indexes for the pre-order case (parent, then children), but none for the
post-order case (children, then parent (depth-first)).  And, of course, there
appear to be NO uses of pre-order traversal.

The uses in the jpref all come from nsTextServicesDocument.cpp (which is part of
the Find and Replace code).

I may have some sort of patch for this tomorrow.
I have a patch completed.  It doesn't help find much because
nsTextServicesDocument::GetFirstTextNodeInNextBlock() calls PositionAt()
constantly.  My change required PositionAt to use IndexOf() to rebuild the stack
of indexes.  ARGH.

I'm going to fix GetFirstTextNodeInNextBlock() to clone the iterator (then throw
away the clone) instead of using PositionAt() to rewind it.  (and add cloning
code to nsContentIterator).

Sigh.
The patch I attached is work-in-progress (though pretty good).  It seems to work
perfectly and is O(1) for Next/Prev, however Find isn't much faster - because
nsTextServicesDocument calls PositionAt all the time, and it forces me to
regenerate the stack.  (*Sob*)

I'm going to solve this two ways: first add ability to clone the iterator
(generally useful), and modify nsTextServicesDocument to use a clone instead of
using PositionAt to rewind the iterator.  Second, I'm going to design (and maybe
even implement) a smarter PositionAt rebuilder that rebuilds until it hits a
common ancestor.  This may not be needed if PositionAt isn't used often after I
make the first fix.
*** Bug 93204 has been marked as a duplicate of this bug. ***
Worse yet: nsTextServicesDocument uses PositionAt EVERYWHERE.  So I'm going to
have to implement a fast PositionAt.  Still working on it.
Summary: Find on Page extremely slow → Find on Page [and in view source] extremely slow
Really, even once that is fixed, there will still be fundamental problems with 
the current code, since IIRC what it's using the iterator to do is copy the 
entire HTML document into a string and build up data for going in reverse from a 
position in the string back to the content nodes.  What we really need is an 
faster iterator that can be used to lazily walk through the document while 
looking for the required string.  (This algorithm should be designed so that the 
whitespace-condensing that needs to happen for Find is easy to do.)  It would 
also be good if the searching algorithm were better than O(N*M) in the worst 
case.  Perhaps it would be good to use a Unicode-friendly variant of the 
superlinear algorithm (|js_BoyerMooreHorspool|) used in jsstr.c (which is O(N/M) 
in the usual case, and O(N) in the worst case), although see also bug 82054.
Just to let you know, guys, that the owner of the Text Services code, Kin, is on 
vacation until next week.
This is a current jprof (without any improvements to ContentIterator).  This
only covers the actual searching.  As you can see, it's _quite_ clear.  (I
wanted to re-verify the older jprofs, and as a baseline for improvements.)  
NOTE: that's not a final (or even compilable) patch, it's meant to be for
discussion/review.  Ignore (for now) the #if 0'd out save/restorestate code, and
the code for Clone() (though that may be useful in the end).

The base change is to keep a stack of Indexes.  The trick is to make
PositionAt() fast, since it's called constantly by nsTextservicesDocument.  This
patch does a smart rebuild of the index stack, looking for the lowest common
parent node and stopping the rebuild there.  Since almost all uses have the new
position near the old one, this should be close to optimal.  (Even better would
be to not have to do it, but that would require a fancy save/restore state
interface (the one commented out isn't fancy enough), or extensive or total
rewrite of nsTextServicesDocument.)

Note: the last set of changes to this have been compiled but not run.  (The
previous version has been run.)  I may be able to jprof this later tonight;
tomorrow at the latest.
Success!  I now have a patch that's _dramatically_ faster.  On a bunch of
searches, IndexOf didn't even show up on the jprof, and the response to a search
was basically instantaneous, even at the end of a 350K jprof file.

NOTE: Something EVIL is being done in nsTextServicesDocument when you click Wrap
Around.  Click that, and performance goes down the tubes - even if it doesn't
wrap over the end in the search.  I'm going to profile that and file a bug
against TextServices; that's not a bug in the iterator.

I'm cleaning up my previous patches (there was a minor bug in that last upload,
by the way) and will post here for r=/sr=.
I was so happy to see progress here I briefly looked at the code, and here are
some comments (this is not a full review, this is not my area):

1. You might want to guard against null parameters to functions.
2. In nsContentIterator::Clone() I think you should return out of memory error
if new did not succeed.
3. I believe typecasting 0 to (void*) is not needed, but it does no harm either.
More like a question of style.
4. Some functions return not implemented error. If they are meant to never have
implementation it would be nice to say so in a comment in the function, and if
they are supposed to never be called you'd better add an assertion.

Thanks Heikki.

Null parameters: I didn't explicitly add any checks, and I don't think I added
any vulerabilities to Null parameters.  I'll make a pass over and add any checks
that seem to make sense.

out-of-mem error from clone(): probably; I'll check it.

Typecast ((void *) 0) is to quiet compilers about passing an int to something
that wants a void*.

The Not Implemented errors are for Clone(), which we may want to implement
someday for those classes.  I may add an assertion, though, to warn someone they
need to implement it if they happen to start calling it.
rjseup: couple of comments on your comments replying to heikki's comments :-)

- null parameter checks: no need if you trust callers to be C++ only code where
there is no interface confusion or programmer incompetence (and the latter can't
be cured completely with null checks, as there are many bad pointer values!).
If a caller might be JS or another such language, in params that should be
interface pointers might be null, so it pays to check in scriptable interface
method implementations.  In no case should inout or out parameter pointers (NB:
not pointer parameters!) be null-checked.

- (void *)0: please use nsnull When In Rome.  nsnull expands to 0, which is
always valid in a pointer context.  If you need to disambiguate the type of an
actual argument to an overloaded method, please raise a red flag -- we should
unoverload such methods on general principles.  Also, (void*)0 is not compatible
with all null pointers, according to the standard.  Function pointers, IIRC, may
not be compatible with void pointers.  Not germane to your uses of (void*)0
here, but worth mentioning to spread the news.

- not implemented failures: use NS_NOTREACHED("method-name-here"), FYI in case
that assertion variant wasn't familiar to you.

My comments on the patch:

1.  Why nsAutoVoidArray mIndexes?  What are the stats (mean, variance) on its
population?  I don't doubt that ancestor lines are typically short, but I'd like
to see numbers for top100 or similar docs.

2.  Why strongly type the nsAutoVoidArray *aIndexes parameter, when it could be
an nsVoidArray*?  Should it not be const nsVoidArray& instead, to better show
that it is an in param?  Ok, if the answer to 1 is "the population almost always
fits in an nsAutoVoidArray", then it makes sense to use the narrowest type, but
nsAuto* in a formal parameter type is usually over-narrow.

3.  NS_ERROR_OUT_OF_MEMORY, not NS_ERROR_FAILURE (er, what heikki said -- sorry,
didn't mean to repeat).

4.  Ah, I'm slow: you want to cast small-ish integers to void* to store them in
an nsVoidArray.  Too bad we never unified the Uint32Array alecf and/or roc were
hacking on under xpcom/ds.  There are nasty 64-bit portability issues here, see
bug 20860, and maybe you can beat that patch in and help it by including the
NS_PTR_TO_INT32 and NS_INT32_TO_PTR macros in your patch here?

5.  Style nit: bracing then against dangling else may be prudent (I think not),
but if the else clause returns, why not invert, so that instead of

+    if (!mPre)
+    {
+      *aSibling = parent;
+    }
+    else
+      return GetNextSibling(parent, aSibling);

you have

+    if (mPre)
+      return GetNextSibling(parent, aSibling);
+    *aSibling = parent;

Same for the GetPrevSibling case later, and maybe for similar if/else structures
that do not contain early returns, but which might want to test mPre rather than
!mPre for symmetry.

6.  Another style nit, already in the code before your patch but I thought I
would ask: why is nsCOMPtr<nsIContent>() better than nsnull, these days?  Long
ago, one needed null_nsCOMPtr() or an otherwise default-constructed temporary,
but no longer.

7.  Any thoughts on whether nsDeque should be revised to extend nsVoidArray?  At
the least, the stack-like uses in your patch could use some Push, Pop, Top, and
ReplaceTop sugar-methods.

Thanks for digging into this -- keep the patches coming!

/be
More information on why Wrap-Around is slow: it's spending ~85% of it's time in
nsTextServicesDocument::PrevBlock for some unknown reason.  I can't imagine why
it would do this, since I'm searching forward, and I never actually wrapped. 
I'll file a separate bug against that.

At this point, unless I believe there's reason not to, I use nsAutoVoidArrays. 
In this case I think it's fairly likely it'll fit.  Also, I doubt we'll have
more than a couple of nsContentIterators live at any given time.

The strong typing on the Constructor method probably isn't needed, but that's
really an internal-only constructor for use from Clone() (in fact, I should make
it protected or private I think).

I'll look at using those macros.  nsVoidArrays are used for integers elsewhere
as well; and yes I agree it's annoying.  (Better would be for it to be a
template class to provide type-checking/etc - sicking was going to spec out a
design and post it to n.p.m.xpcom).

Style nits: I'll look at them.  I coded it that way because it fit the old logic
better, and so it was easier for me to check the correctness of my rewrite. 
I'll add the nsnull's.

I've never had a chance to look at deque...  Sounds like it would be a good
merge with voidarray perhaps.

Thanks brendan.
If 'wrap around' is on, it has to call nsTextServicesDocument::PrevBlock a bunch 
of times to get the index of the starting block, which is remembered so that 
searching can stop when that index is reached again after searching the entire 
document.
This is the culprit for wrap around: 

nsFindAndReplace::GetCurrentBlockIndex()

which is basically: 
 do {
   aDoc->PrevBlock(); blockIndex++;
 } while (!at start of document);

As you can see, that's REALLY bad when you're deep in a large document.

It's accentuated a LOT by the implementation of
nsTextServicesDocument::PrevBlock, which calls the various
{Get,Find,}FirstTextNodeIn{Prev,Current,Next}Block() many times to do a
PrevBlock.

Even a trivial (and still expensive) recoding of the above to iterate forwards
from the start until it was in the same block would be vastly faster.  The best
solution would be to modify the interface to nsTextServicesDocument to support a
wrap-search more directly, so we don't have to find the current index in
nsFindAndReplace the very hard way.
Brendan: I left the if (!mPre)/etc code alone.  I think in context it makes more
sense (to the eye, given the clauses above and below) the way it is.  It also
should be just as efficient given a reasonable compiler.  Ready for r/sr I
believe.  Just about everything else mentioned was done; I couldn't find any
additional places null checks are needed (there are quite a few already).
Why no 'const nsAutoVoidArray& aIndexes' parameter type change for
nsContentIterator's ctor?  This is a little more substantive a nit than the
others I picked, I think, although it's not a big deal.

/be
Forgot that one, sorry.

imoT found a bug introduced in selection by this (nsContentSubtreeIterator
didn't properly update the index stack).  Fix for that is in testing.
The subtree bug is semi-fixed.  No longer crashes, generally works but produces
one weird behavior.  I'll update the patch tomorrow once I've shaken it out for
a day or so.  
Ok, that patch updates all the subtree code - it was making assumptions about
got GetNextSibling/etc work that I broke.  I've verified that drag selection
works correctly again (imoT's problem) and all highlighting.  I also ran debug
code (in there, compiled out) to verify all nodes returned in subtree Init and
Next against an unmodified nsContentIterator, and they all check out correctly.

r=/sr=?
Blocks: 91351
Updated per kin's comments in IRC.  I also found by inspection a very subtle
issue about how endnodes are handling in subtrees that may have been entirely
accidental.  I restored the old behavior and marked with XXX and comments.
kin is reviewing the latest patch (emailed to him); he'll respond tomorrow.  If
it's ok by him I'll add it here.  Sorry about all the patches, I was modifying
and uploading as he made requests.
A couple of questions/comments/issues I see so far in the 08/17/01 09:46 
atatchment:



* First() and Last() don't reset mIndexes. They can be
  called at any time just like PositionAt().



* nsContentIterator::Init() doesn't build up an mIndexes
  array necessary for an immediate call to Next() or Prev().
  Imagine the following example:

    <div><b>bold</b> some text <i>italics</i></div>

  The DOM tree for the above example would look like this:

              <div>

                |
                |
  +-------------+-------------+
  |             |             |
  |             |             |

 <b>      " some text "      <i>

  |                           |
  |                           |

"bold"                    "italics"


  Now suppose the start container and offset in the range was
  the text node containing "bold" and the end container and
  offset was the text node containing "italics". Walking through the Init()
  code by hand, I think when we exit, we have mCurNode == the "bold"
  text node and mIndexes contains only one item, a zero. Wouldn't that
  create an error situation if Next() was then called, since there
  aren't enough indexes in the array to get to " some text "?



* In nsContentSubtreeIterator::Prev() we can avoid the expensive
  calls to RebuildIndexStack() if we just used a temp index array
  which starts out as a copy of mIndexArray. Then we could just
  override the contents of mIndexArray with the contents of the
  temp array in the places where things succeeded. Isn't copying
  the index array much cheaper on average than rebuilding the
  indexes?



* GetTopAncestorInRange() seems to remove everything except the
  first item in aIndexes while parent is in the range. I'm not sure
  that's correct? We need to keep in mind that mCommonParent can
  be higher in the parent hierarchy than the outAncestor returned
  by this method, so the number of items in the index array
  when we return should be equiv to the number of parents between
  and including mCommonParent and outAncestor right? This would
  insure that we had enough info in the index array to avoid
  calling IndexOf() when using the various utility methods.



* The more I look at this, the more I understand jfrancis'
  (the original author's) code grouping. GetNext/PrevNode() was
  supposed to be a utility method that decided what to return
  next based on Pre/Post iteration, and GetNext/PrevSibling()
  was supposed to be a simple utility method that got the
  equivalent of the next sibling as if the entire tree were
  flattened. With your changes some of the Pre/Post iteration
  is still done in GetNext/PrevNode and some of it is done
  in GetNext/PrevSibling() which I think kind of clouds the
  logic. Also, when used by the SubtreeIterator, most of this
  Pre/Post code in GetNext/PrevSibling() is turned off,
  basically restoring it to what jfrancis originally had.
First, Last: Obvious mistake, sorry.  Fixed to basically just call PositionAt(),
which does an optimal partial-rebuild of the index stack.  Note: This meant that
   the SubtreeIterator needed to support PositionAt (or rather not override it).
 I couldn't see any reason not to.  If we want to keep PositionAt non-public in
subtree, we could make it protected, or override First and Last and do Rebuilds,
or just decide it's not important and do rebuilds for First/Last in all cases.

nsContentIterator::Init(): I think you're right, because of the call to find the
common ancestor.  Now it ignores the index array until it either calls
MakeEmpty(), or it succeeds and calls RebuildIndexStack.

nsContentSubtreeInterator::Prev: the RebuildIndexStack calls are more expensive
than copying an array, but they should only happen when we hit the beginning of
the subtree or something else goes wrong. Copying the index array on every Prev
call would be more (total) overhead than rarely rebuilding the index stack.  IMO

GetTopAncestorInRange: cut-and-paste typo.  It was only being called with an
actual array parameter from Prev(), and most code doesn't call Prev on the
subtree iterator.  That should have been:
      aIndexes->RemoveElementAt(aIndexes->Count()-1);
(Remove the last element)

The reasoning for moving code from Next/PrevNode() to GetNext/PrevSiblingwas to
unify the code; at the time I did it I was concentrated on nsContentIterator,
and didn't realize that the SubtreeIterator used them as well.  I did it mostly
to unify the logic, which was almost identical for Pre and Post, with just minor
differences.  BTW, there is NO code that uses Pre that I can find.

Given that Subtree needs what amounts to the Pre version of GetNextSibling (and
the Post version of GetPrevSibling), it may make sense to use just the one
parameter, so NextNode becomes:
...
  // else next sibling is next
  return GetNextSibling(cN, mIndexes, ioNextNode, mPre);
}

and change the conditionals in GetNextSibling from (!mPre && !aNoParents) to
(!aNoParents).  (Similar for GetPrevSibling).

Or we can simply re-split GetNextSibling(), and have a bunch of (almost)
duplicate code in NextNode.  (Ditto for GetPrevSibling() and PrevNode).  I tend
to prefer parameterized code to duplicate slightly-variant code, but that is
just personal preference.  (I find it easier to maintain when you don't have to
make similar changes in several different places.)


My apologies for missing some of these things, and making so many go-rounds.  I
was very focused on finding some way to save cycles in what was a really nasty
hotspot, and didn't take as much time to look over the whole usage as I should
have, plus I started out not understanding how this class was used.  The patch
is much better now due to all your comments I believe.  (And imoT's testing)
Reading through the comments here, I think one assumption people (including me)
have (mostly) been making, that no one changes the content tree out from under
the iterator, is incorrect, at least in editor and maybe others (compose).

DeleteSelection(), InsertText(), etc modify the content tree.  We have to adjust
any cached indexes when this happens.  I'm working on this - I plan to add a
RebuildState() method to be called after modifying content (such as from
nsTextservicesDocument::AdjustContentIterator(), or ::InsertText() (add a call
to Adjust, etc)).

ALternatively, I could add some double-checking all the time (verify the indexes
using GetChildAt() on _every_ call, all the way back up the stack).  This would
obviously be expensive, though not as bad as the original code (IndexOf).

Or we could set a bit in the contentiterator to tell it to rebuild on the next
call (a bit of a pain).  Or allow the iterator creator to specify if we keep an
index stack (and thus guarantee to not change the content, or to tell us if it
does).  

The question becomes: do we know who may modify the document while we're
searching it?  What about dhtml?  Find in a still-loading page?  How does
deletion/insertion affect subtree iterators (and in particular their other
start/end arrays)?  TextServicesDocument should be not too hard (modulo the
range issue).

MAN O MAN do I wish that content nodes were doubly linked to siblings.  Then
this would ALL be moot.
BTW, thinking about it, the "check on each call" isn't _that_ expensive.  It's
like this:

if (GetChildAt(parent,mIndexes[mIndexes.Count()-1]) == mCurNode)
   indx = mIndexes[mIndexes.Count()-1];
else
   indx = parent->IndexOf(mCurNode);
indx++;

...
mIndexes->ReplaceElementAt(mIndexes.Count()-1,indx);

This would only call IndexOf if our index was wrong, and GetChildAt isn't _too_
bad (though it's still cost).  It's also guaranteed safe in the face of content
changes, so long as the current node (and parents) isn't deleted, which is
already dealt with in TextServicesDocument.

BTW, that issue (deletion of the current item requiring resetting the iterator
position) means that the only issues are insertion (which is always safe), and
places that are already calling PositionAt (or equivalent) in case the current
node is no longer valid. This means basically it's only TextServicesDocument we
need to worry about.   I think.
While I still haven't actually looked at the patch, that suggestion seems fine
for verifying the tree. I forget who I discussed this with on IRC a couple of
months ago (smfr, maybe?), but I think thats what we came up with,

GetElementAt should be O(1) - its a simple array lookup. You really can't get
much faster than that, I think. If nsVoidArray::ElementAt was moved into the
header file so that it could be inlined, you'd probably even eliminate the
function call and indirection (mImpl), turning this into one addition + one
pointer dereference. Is there a reason that thats not done?

What if the current node is reparented, so that the cached parent is still at
the same place, but we are not its child anymore? Can this happen? (drag and
drop?)

wrt generated content, see the 2001-06-03 01:05 comment.
Since we'd only be verifing the current node's index, there's no problem with
the parent's index being moved.  The parent's index in the array would be wrong,
but that doesn't matter until we go up a level - and then we'd end up
reverifying that index with it's parent.  This would be O(1), as you say, though
with a larger K - but not large.

Note that if there were a chance that the current node were deleted, code using
the iterator would have to use PositionAt (as DeleteSelection does).  The only
users of PositionAt are TextServicesDocument (which we understand - there are
two things that modify the content there; DeleteSelection and InsertText, though
it's possible that callers might modify it) and nsHTMLEditor.cpp, which is just
using it to set an initial position.  However, as I mentioned, callers to
TextServicesDocument might add things (or delete things other than the current
node), and of course they might add things.

The safest solution is as per my comments above - reverify the index before each
increment.  Patch to be posted.  (I'm afraid there might be an award for a bug
with the most versions of a patch...)
Patch as per my last comment; reverifies index on each call.  Note that this
does not change the issue that if you remove the current node from the tree the
iterator will become confused.  The only code I know that deletes while
iterating (nsTextServicesDocument::DeleteSelection) checks for that and
PostionAt's to move the iterator to a valid node.

A separate bug to inline ElementAt/IndexOf/etc has been filed as bug 96108.
Assignee: kin → jfrancis
Status: ASSIGNED → NEW
jfrancis will take ownership of this bug.  discussed in performance meeting today.
095
Status: NEW → ASSIGNED
Target Milestone: mozilla1.0 → mozilla0.9.5
I'm still plugging away at this.  Should have something worth testing in the next 
day.  I'm making RJ's index stack approach only apply to nsContentIterator.  For 
nsSubtreeIterator it shouldn't make a very big win over a trivial "one level" 
cache, and it complicates the code quite a bit.  The reason it shouldn't make a 
big differentce is that the subtree iterator doesn't pop down and then back up 
the way the regular content iterator does.  Example:

You are iterating over:

0---0---0---0---0---0---0---0---0---0---0---0---0---0---0---0---0---0---0
|   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
|       |       |       |       |       |       |       |       |       |
0       0       0       0       0       0       0       0       0       0

In the content iteraotr, you will be constantly diving down into the children, 
and then back up to the parents (assuming post order iterator).  With the subtree 
iterator, you will never touch the children, and will just hit the top level 
parents in order.  So a simple cached index (instead of stack) should be 
sufficient and less complicated.
This patch is a more minimal implementation of RJ's index caching scheme.  I get
underflows when testing it with Find.  Selection and editor actions (which both
beat on the iterators) seem to work fine, though.

I'll debug the underflow issue, but I wanted to post the patch now so others
could begin commenting.
The underflow is from PositionAt() I'll bet, since that changes mCurNode, and I
don't see any mods to it.  That might be ok - the code can rebuild the stack in
a lazy manner after PositionAt; it might even be more efficient given the
frequent PositionAt calls - however, since PositionAt usually doesn't move the
iterator far, my PostionAt code was pretty efficient at rebuilding the tree.  My
tests indicated that PositionAt is called so often that even a simple "IndexOf()
back to the root" on each one was way too expensive, which was why I made the
more complex "rebuild only changed part of the index stack" version.

It you do a lazy rebuild, though, you'll need to at least clear the index array
in PositionAt. I advise benchmarking it against mine before finalizing on that.
Thanks for feedback.  I hadn't examined kin's textservices code and didn't know 
he was using PositionAt.  I'll mix back in your work there - don't want to remove 
juicy bits when we need the juice!
latest patch.  This one is pretty real, functionally.

I have incorporated RJ's PositionAt() optimizations.  Thanks to RJ for doing all
the heavy lifting here.  I also removed the two places I accidentally left
unneeded IndexOf() calls in my previous patch.  Seat-of-pants testing on large
lxr files reveals that the performance of this patch is on par with RJ's final
patch.

I also didn't have any correctness problem in the editor or selection, or even
hit any cache misses doing Find.

Kin has some changes he wants that I have not yet incorporated.  If folks would
start testing and putting any reqested changes into the bug that would be great.
Looks good to me; I'd be willing to r= it (having been through the code
thoroughly before).
*** Bug 99881 has been marked as a duplicate of this bug. ***
this latest patch incorporates review requests from Kin.  I think we are ready
to rock.  Kin, are you ready to sr?
Whiteboard: [perf][nav+perf] → [perf][nav+perf] fixinhand; need r,sr
Comment on attachment 49530 [details] [diff] [review]
content/base/src/nsContentIterator.cpp take#3

Looks good to me
r=rjesup@wgate.com
Attachment #49530 - Flags: review+
above patch has some fnal tweaks from Kin's super review.  Cleanup, comments,
and minor changes only.
Comment on attachment 49844 [details] [diff] [review]
content/base/src/nsContentIterator.cpp

sr=kin@netscape.com
Attachment #49844 - Flags: superreview+
Attachment #49844 - Flags: review+
Whiteboard: [perf][nav+perf] fixinhand; need r,sr → [perf][nav+perf] fixinhand; ready for checkin
Can we commit this soon?  I think we're done with r/sr...
fix landed on trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
I tested this with great results on:
  mozilla.org
  View Source of mozilla.org
  slashdot.org
  View Source of slashdot.org
  lxr.mozilla.org/mozilla/source/widget/src/windows/nsWindow.cpp

It takes about a second to search for "e" on the lxr page, but that's almost
6000 lines of code.  It took the same amount of time to search for
"ScheduleHookTimer", which doesn't appear until line 5634.  I'm running a
PII-500.  I find the time acceptable, given the enormity of the page.

I wanted to test this on View Source of the lxr page, but couldn't because View
Source of that page maxes out my CPU.

So, looks really good on Windows using 2001092403.  Anyone for verifying this
using Linux and Mac?

Nice work, guys!  This fixes one of the large usability complaints I have with
Mozilla.
A vast improvement, testing on Linux, new CVS.

Ran a query in bugzilla that returned 6583 hits: Bugs of all resolutions cept
closed, with "window" in summary. (btw.. there was no hang after it had loaded!)

Then searched for the word "work". Unlike before, there was now no slowdown when
finding the next occurance of the word: The time it took between finding the
399th the 400th occurance of it, was as short as between finding the 99th and
100th. And the time it took between clicking "find" till the word was found was
comfortably close to instant after each time.

This is fixed. Thank you all.
(Damn mid-air collisions...)

On linux (k6-2 300 w/ 256MB) a search for "for" on this page took about less
than 1/4 of a second on search click of "find" regardless of which half it was in.

View Source of this page, same search, each time it took three seconds to find
each "for" word. This was again consistent throughout the length of the "page"
being searched.

I wonder if find in View Source cool be speeded up? I noticed just opening View
Source on this page took several seconds (for the content to be displayed), so
maybe it's an entirely different animal. Thoughts?
A view source document, with highlighting on, is on the order of 10 times bigger
(just raw source wise) than the document it's the source of.  For documents with
lots of markup, 20 times is not uncommon.  There's just a lot more nodes to deal
with there....
Just tested on linux on
view-source:http://lxr.mozilla.org/mozilla/source/widget/src/windows/nsWindow.cpp
with syntax highlighting _on_

Searching for "ScheduleHookTimer"

Without patch: 3 mins, 40 secs
With patch: 6 secs

I'd say verified on Linux.  :)
(Damnit another mid-air...)
Yup with highlighting turned off it's lightning fast in view source. Perhaps
something the View Source maintainers can think about some time...
Nominating for nsbranch+.  This is a significant user-noticable difference. 
(Will PDT notice this if it's marked Resolved?  I'm guessing not.  jfrancis: if
you're willing to entertain putting it on the branch, please reopen to put it on
PDT radar.  If you're not willing, just remove nsbranch+.  Thanks.
Keywords: nsbranch+
I'm just wondering, I tried searching this bug 31770 page in the browser for the
word "the" is much faster with today's build on W2K than the search in view
source for word "the"  could this be because of a page is in memory or cached?
reopeneing to get pdt attention for 094 nomination
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [perf][nav+perf] fixinhand; ready for checkin → [perf][nav+perf] fixed on trunk; not fixed on 094
Much as I agree this should land on the branch, it hasn't been awarded any
official nsbranch+ yet.  Still in nomination so fixing the keyword.

CCing jaimejr for nsbranch+
Keywords: nsbranch+nsbranch
Marking nsbranch+
Keywords: nsbranchnsbranch+
Note that bug 101710 may be a regression from these changes.
bug 101710 is a regression due to this patch; I have attached a patch to fix bug
101710 to it.
Updated status
Whiteboard: [perf][nav+perf] fixed on trunk; not fixed on 094 → ETA: fixed on trunk; waiting for approval for 094. [perf][nav+perf]
Note: the patch which fixes bug 101710 must be checked in to the 0.9.4 branch if
the patch for this bug is checked in. 

It's a little too late to take this one. nsbranch-
Keywords: nsbranch+nsbranch-
Correcting for the right process. Leaving as nsbranch+, but this is PDT-.
Keywords: nsbranch-nsbranch+
Whiteboard: ETA: fixed on trunk; waiting for approval for 094. [perf][nav+perf] → [PDT-] ETA: fixed on trunk; waiting for approval for 094. [perf][nav+perf]
marking fixed again since we are not going to land on 094 branch
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
okay, adding vtrunk kw, since this won't be checked into the branch. unless i'm
misunderstanding joe's last comment. :)

if somehow this *does* get checked into the branch, do remove 'vtrunk' so that
it can get back onto my radar for branch verification. thx!
Keywords: vtrunk
verified using boris' test case described in 2001-09-24 12:20 --tested Find in
both the web page and the source windows, for grins.

linux, 2001.10.18.08-trunk comm: 1.97s [page], 10.65s [source]
winNT, 2001.10.18.06-trunk comm: 1.34s [page], 6.28s [source]
mac 10.1, 2001.10.22.13-trunk comm: 1.44s [page], 6.19s [source]
Status: RESOLVED → VERIFIED
Keywords: vtrunk
Product: Core → Mozilla Application Suite
Component: XP Apps: GUI Features → UI Design
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: