Closed Bug 31770 Opened 25 years ago Closed 23 years ago

Find on Page [and in view source] extremely slow

Tracking

(Not tracked)

Status:

VERIFIED FIXED

Milestone:

mozilla0.9.5

People

(Reporter: spam, Assigned: mozeditor)

References

Details

(Keywords: perf, topperf, Whiteboard: [PDT-] ETA: fixed on trunk; waiting for approval for 094. [perf][nav+perf])

Attachments

(19 files)

real time jprof profile of find-in-page (one of the clearest profiles I've ever seen) 25 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 257.35 KB, text/html		Details
profile of find-in-page for string later on the same line (in nsCSSFrameConstructor.cpp) 25 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 89.16 KB, text/html		Details
work-in-progress to make ::Next much much faster (not a patch) 24 years ago Randell Jesup [:jesup] (needinfo me) 2.40 KB, text/plain		Details
nsContentIterator now is MUCH faster - except for PostitionAt 24 years ago Randell Jesup [:jesup] (needinfo me) 9.97 KB, patch		Details \| Diff \| Splinter Review
Current jprof of searching a jprof.html file a bunch of times 24 years ago Randell Jesup [:jesup] (needinfo me) 337.75 KB, text/html		Details
Work-in-progress patch to nsContentIterator.cpp 24 years ago Randell Jesup [:jesup] (needinfo me) 17.38 KB, patch		Details \| Diff \| Splinter Review
Working patch, ready for review 24 years ago Randell Jesup [:jesup] (needinfo me) 16.79 KB, patch		Details \| Diff \| Splinter Review
Patch post-brendans/heikki's comments (includes nscore.h for int32<->void*) 24 years ago Randell Jesup [:jesup] (needinfo me) 18.45 KB, patch		Details \| Diff \| Splinter Review
Completed patch; subtree iterators work correctly 24 years ago Randell Jesup [:jesup] (needinfo me) 28.73 KB, patch		Details \| Diff \| Splinter Review
Updated: comments, restored handling of an unlikely case in subtrees 24 years ago Randell Jesup [:jesup] (needinfo me) 29.88 KB, patch		Details \| Diff \| Splinter Review
Update for minor issues of maintainability that kin raised 24 years ago Randell Jesup [:jesup] (needinfo me) 29.88 KB, patch		Details \| Diff \| Splinter Review
Correct patch (last two were the wrong file - ran diff from the wrong directory) 24 years ago Randell Jesup [:jesup] (needinfo me) 31.45 KB, patch		Details \| Diff \| Splinter Review
Updated patch: corrects missing return in Rebuild. Fixes opt builds 24 years ago Randell Jesup [:jesup] (needinfo me) 33.41 KB, patch		Details \| Diff \| Splinter Review
As per comments above (full patch). Note: changes from the previous patch are only to nsContentIterator.cpp 24 years ago Randell Jesup [:jesup] (needinfo me) 34.20 KB, patch		Details \| Diff \| Splinter Review
Patch that allows content tree to change between iterator calls 23 years ago Randell Jesup [:jesup] (needinfo me) 35.24 KB, patch		Details \| Diff \| Splinter Review
content/base/src/nsContentIterator.cpp 23 years ago Joe Francis 13.51 KB, patch		Details \| Diff \| Splinter Review
content/base/src/nsContentIterator.cpp, take#2 23 years ago Joe Francis 18.65 KB, patch		Details \| Diff \| Splinter Review
content/base/src/nsContentIterator.cpp take#3 23 years ago Joe Francis 23.22 KB, patch	jesup : review+	Details \| Diff \| Splinter Review
content/base/src/nsContentIterator.cpp 23 years ago Joe Francis 23.27 KB, patch	kinmoz : review+ kinmoz : superreview+	Details \| Diff \| Splinter Review

R.K.Aa.

Reporter

Description

•

25 years ago

To avoid filing duplicates i searched bugzilla for new, reopened, verified, fixed and unverified bugs. Got around 3600 hits. (Resulting page loaded in 5 minutes) Now: Searching with "find on page" got slower and slower - to find the next occurance of a word mentioned twice in one sentence took around 15 seconds when i was half way down the page. Almost like it searched the whole page from start all over again each time the "find" button was clicked. In effect it's so slow it's useless. I can read the page myself, faster than the searchtool can. Build ID: 2000031318

matt

Comment 1

•

25 years ago

search dialog bug

Assignee: matt → law

Doron Rosenberg (IBM)

Comment 2

•

25 years ago

reporter - what sort of system (processor, etc) are you on? When you search, how bug is the cpu usage?

R.K.Aa.

Reporter

Comment 3

•

25 years ago

Intel P120, 96 MB RAM, RedHat6, somewhat upgraded. Gnome/sawmill. No problems. I did a comparision, a little "manual benchmark". This query was performed first in Netscape 4.7, then in Mozilla Build ID 2000031418: http://bugzilla.mozilla.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=VERIFIED&rep_platform=PC&op_sys=Linux&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&short_desc=&short_desc_type=substring&long_desc=&long_desc_type=substring&bug_file_loc=&bug_file_loc_type=substring&status_whiteboard=&status_whiteboard_type=substring&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=Reuse+same+sort+as+last+time (that got long heh...) Well: netscape found all 26 occurances of the word "ftp" (no quotes) in that page in 45 seconds. Mozilla found them in 6 minutes, 35 seconds. The test was performed by clicking on "search" again as soon as the word was found. Netscape 4.7 was almost 9 times faster, in other words. What is very striking is that Mozilla gets slower and slower as it searhes further down the page. In the end it takes around 15 seconds to find the second occurance of ftp in a sentence looking like this: ftp://ftp.... (etc) And yes - CPU load was high while Mozilla searched: It uses what it can - figures like 87% and 95% showed up with a "top" when i tested it afterwards. (No "top" was running while the comparative search was done, and the apps were running "alone" - under the same conditions.) The only "difference" was that this was done manually, and since Netscape found the word "ftp" very quick at times, i may have been a little slow hitting the find-button again. In case it did any difference it's to Mozilla's discredit, unfortunately.

Claudius Gayle

Comment 4

•

25 years ago

Confirming bug. I tried it on WinNT with a 450Mhz P2 and the lag is still noticeable. Very telling is the case where ftp occurs again only a few characters away from where the last occurence was. 4.x finds this instantaneously. Seamonkey takes just as long as the others. I get the impression Seamonkey searches the whole page up to the point where it last found an occurence each time through. Whereas 4.x just picks up searching wherever it left off (the last occurence). I'm also changing the component and QA.

Status: UNCONFIRMED → NEW

Component: Search → XPApps

Ever confirmed: true

QA Contact: claudius → sairuh

Summary: Searh/Find on Page extremely slow → Find on Page extremely slow

sairuh (rarely reading bugmail)

Comment 5

•

25 years ago

h'm, i didn't really see much of a lag btwn hitting the Find button (searched for the string 'ftp' in the link above from dark@c2i.net): linux [2000.03.15.06-nb1b, mozilla] -- 2 seconds (okay, that's slow) mac [2000.03.15.05-nb1b, mozilla] -- less than 1 second winNT [2000.03.15.06-nb1b, commercial] -- less than 1 second however, i did notice when i first went to that page, and initially brought up the Find On Page dialog, that it would take 2-6 seconds for the dialog to appear. did anyone else see that?

R.K.Aa.

Reporter

Comment 6

•

25 years ago

Yes :) But that's the least of the problems..

R.K.Aa.

Reporter

Comment 7

•

25 years ago

sairuh: A second thought.. Did you go through the whole procedure, finding *all* (26?) occurances of the word ftp in that page?

Simon Fraser [no longer active]

Comment 8

•

25 years ago

This is probably a performance issues with nsITextServices, which kin wrote. cc:ing kin. FYI, Spellcheck in editor uses the same service, so may also show performance problems on long documents.

sairuh (rarely reading bugmail)

Comment 9

•

25 years ago

ah...i think i see what you mean: i only searched a few times, ie, the first 3 or so instances of "ftp". so, i just tried searching through the whole page, and what i notice is that searches for the first half of the page take only ~1sec...but once i'm searching through the latter half of the page, it takes more like 2-3sec btwn each search (longer on linux than on mac or winNT). but not extremely slow from my perspective...

R.K.Aa.

Reporter

Comment 10

•

25 years ago

I oughta look you up in the Silmarillon.. Regardless of horsepower the factors here speak for themselves, I think. 6.5 min. search versus 45 sec. This can be done much better.

Bill Law

Comment 11

•

25 years ago

Giving this to kin. The problem has to be in the document text services.

Assignee: law → kin

Kathleen :Brade

Updated

•

25 years ago

Keywords: perf

Target Milestone: --- → M16

mental

Comment 12

•

25 years ago

I've noticed a considerable slowdown of find as well. It would seem to be related to an insane amount of memory usage. After the first find, virtual memory usage jumps up about 60 or 70 MB, and then slowly creeps up with each additional find. I don't think it's leaks, per se, as all of the memory is immediately freed when you close the find dialog. This is with Linux on a Celeron 500, 128MB RAM.

Sean Richardson

Comment 13

•

25 years ago

*** Bug 36005 has been marked as a duplicate of this bug. ***

kinmoz

Comment 14

•

25 years ago

Accepting bug.

Status: NEW → ASSIGNED

rubydoo123

Updated

•

25 years ago

Target Milestone: M16 → M17

rubydoo123

Comment 15

•

25 years ago

moving to m17

R.K.Aa.

Reporter

Comment 16

•

25 years ago

*** Bug 45782 has been marked as a duplicate of this bug. ***

Blake Ross

Updated

•

25 years ago

Keywords: nsbeta3

sairuh (rarely reading bugmail)

Updated

•

25 years ago

Component: XP Apps → XP Apps: GUI Features

rubydoo123

Comment 17

•

25 years ago

setting to m18, nsbeta3

Target Milestone: M17 → M18

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 18

•

25 years ago

I did a jprof profile (to be attached) of find-in-page in nsBlockFrame.cpp (for a string that wasn't in the file). The basic problem here is that nsContentIterator::NextNode (for post-order traversal) calls |parent->IndexOf(cN, indx)|, which is O(N) on the length of the array. On a huge pre element, this is really really bad, because when you iterate through the whole list, the iteration is O(N^2) when it should be O(N). nsContentIterator needs to store the index of the current node. If there's a possibility that the list changes during iteration, then it should double check that the index is still valid, and *if not*, do the O(N) search. It's probably worth going through nsContentIterator and seeing if other things need cleaning up too...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 19

•

25 years ago

Attached file real time jprof profile of find-in-page (one of the clearest profiles I've ever seen) — Details

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 20

•

25 years ago

Note that since that's a realtime profile (rather than time spent by the app), g_main_poll shows up. It should be ignored.

kinmoz

Comment 21

•

25 years ago

Cc jfrancis@netscape.com since he is the author of the content iterator.

Joe Francis

Assignee

Comment 22

•

25 years ago

I have no idea how to read that profile. I'll have to reprofile this with a tool whose output I can read. caching the index will make the iterator faster, but does that explain a 15 second delay between finding two occurances of a string in the same line? I smell another problem...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 23

•

25 years ago

To see how to read the profile: http://lxr.mozilla.org/mozilla/source/tools/jprof/README.html What the profile is showing is that 73% of the time in Find in Page is spent in the call to IndexOf() within nsContentIterator::NextNode. That's clearly a problem.

Joe Francis

Assignee

Comment 24

•

25 years ago

Right. But while indexof is slow, that doesnt mean thats the problem. The real problem may be that we are iterating unneccessarily, etc. It's hard to see how two occurances of the string within the same node should need to call indexof a bunch of times.

rubydoo123

Comment 25

•

25 years ago

moving to m19

Target Milestone: M18 → M19

Joe Francis

Assignee

Comment 26

•

25 years ago

for instance, nsTextServicesDocument::GetCurrentTextBlock() is spending all of it's time in the iterator. That smells bad. Do you want to look at that Kin? On another front, I'm wondering if I can get away with the caching David suggests. I can if all the callers are keeping their contract not to use the iterator while chaning the content they are iterating. I think right now you are only hosed if you delete content, but with that change I think you might be hosed when adding content as well. Maybe I should just try it and see what breaks. :-P

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 27

•

25 years ago

jfrancis: If you're worried about users changing the content model while iterating, you might want to try one of: a) cache *both* the index and the current node, and when you want the next node, call IndexOf (which is O(1)) on the cached index, and if it's *not* the current node, do the O(N) search b) in DEBUG mode only, cache the current node, and always cache the index, and in DEBUG mode only do the O(1) check and Assert if it fails. This would prevent callers from triggering lots of O(N) searches.

Joe Francis

Assignee

Comment 28

•

25 years ago

dbaron: I don't understand "call IndexOf (which is O(1)) on the cached index". IndexOf takes nodes, not indexes. And it's never O(1) unless you are asking for the index of the first child. I think you mean something else, but I'm not sure what...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 29

•

25 years ago

Sorry, I meant "ChildAt" instead of "IndexOf" in part (a) of my last comment.

rubydoo123

Updated

•

25 years ago

Keywords: nsbeta3

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 30

•

25 years ago

beppe: Please do not remove nsbeta3 nominations made by others.

Keywords: nsbeta3

rubydoo123

Comment 31

•

25 years ago

well, perf bugs are in m19 because we are addressing them after the correctness and polish bugs, hence I removed the nsbeta3 keyword, since you want the keyword in -- marking nsbeta3- NOTE: will readdress after correctness and polish bugs are addressed

Whiteboard: [nsbeta3-]

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 32

•

25 years ago

Attached file profile of find-in-page for string later on the same line (in nsCSSFrameConstructor.cpp) — Details

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 33

•

25 years ago

This profile still shows that most of the time is spent in the iterator. It looks like we probably are doing unnecessary work on every successive find. In the original profile, of searching a large document for a string it did not contain, most of the time was spent in nsContentIterator::Next, which was called (roughly equally) by: nsTextServicesDocument::FirstTextNodeInNextBlock(nsIContentIterator *) nsTextServicesDocument::CreateOffsetTable(nsString *) In the new profile finding text later on the same line, about 2/3 of the iteration time is spent in nsContentIterator::Next, all of which was within nsTextServicesDocument::CreateOffsetTable(nsString *) and about 1/3 of the time was spent in nsContentIterator::Prev, all of which was within nsTextServicesDocument::::FirstTextNodeInCurrentBlock(nsIContentIterator *) . This makes me think that probably the usage of the iterator could be improved too...

rubydoo123

Comment 34

•

24 years ago

find is rendered useless the further down the file, it basically comes to a halt. For messages or composed pages, if a find is invoked and the search word is repeated through the file, each find will cause the next portion to be slower. In one test with a page that was approx. 700 words, with the word "help" repeated within the text 50 times, resulted in a wait of 15-17 seconds between words. That is a major problem.

Severity: normal → major

Keywords: rtm

Priority: P3 → P2

Whiteboard: [nsbeta3-] → [nsbeta3-][rtm+]

Simon Fraser [no longer active]

Comment 35

•

24 years ago

Find needs to be entirely rearchitected anyway.

rubydoo123

Updated

•

24 years ago

Priority: P2 → P1

Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm+][p:1]

rubydoo123

Comment 36

•

24 years ago

Not setting rtm +/- until i understand the scope of the fix, if a viable 'patch' that is low risk can be written to resolve this, then it will be considered. if a complete rearchitecture needs to be done, then this will need to wait until post rtm.

Whiteboard: [nsbeta3-][rtm+][p:1] → [nsbeta3-][p:1]

rubydoo123

Comment 37

•

24 years ago

just talked with kin, he believes it is a low risk fix for a very visible problem with a highly used feature. Kin, please include the required information per the rtm checkin rules

Whiteboard: [nsbeta3-][p:1] → [nsbeta3-][p:1][rtm+ NEED INFO]

kinmoz

Comment 38

•

24 years ago

Simon and I did some looking at this. It turns out the find dialog is stateless, so every time you press find, the find code uses the TextServicesDocument::PrevBlock() method to count how many blocks from the beginning of the doc the current block is, just in case it has to wrap around. Going backwards in the document is very expensive because the content iterator can't find the previous content node without calling IndexOf. IndexOf can get really expensive especially in big flat documents. I'll need to provide some method that can give the find code the current block index without having to resort to going backwards through the document.

R.K.Aa.

Reporter

Comment 39

•

24 years ago

i believe the search engine never "wraps around"? If you mean it at some point would start searching from the beginning again. That never happens. To test: search for a common word, all occurances in a page. Then change the word it searches for and search again. It will only beep, even if the word is mentioned. When it comes to "end of search" of the first search, it never "leaves" that last spot it got to, but subsequent searches are only performed in context *after* it, and not "wrapping" to search from start of page again.

Simon Fraser [no longer active]

Comment 40

•

24 years ago

It should wrap around, and certainly worked. Did you check the 'wrap' checkbox in the dialog?

R.K.Aa.

Reporter

Comment 41

•

24 years ago

Oops..silly me. I assumed it would autowrap. (4.7* almost does that, prompts first though)

rubydoo123

Comment 42

•

24 years ago

removing + per pdt sw rules

Whiteboard: [nsbeta3-][p:1][rtm+ NEED INFO] → [nsbeta3-][p:1][rtm NEED INFO]

kinmoz

Comment 43

•

24 years ago

Moving milestone to Future for now.

Target Milestone: M19 → Future

rubydoo123

Comment 44

•

24 years ago

removed need info and set to -

Whiteboard: [nsbeta3-][p:1][rtm NEED INFO] → [nsbeta3-][p:1][rtm-]

R.K.Aa.

Reporter

Comment 45

•

24 years ago

*** Bug 61965 has been marked as a duplicate of this bug. ***

Alec Flett

Comment 46

•

24 years ago

adding myself to CC, hope I can help out.

Blake Ross

Updated

•

24 years ago

Keywords: nsbeta3, rtm → nsbeta1

OS: Linux → All

Hardware: PC → All

Whiteboard: [nsbeta3-][p:1][rtm-]

Doron Rosenberg (IBM)

Comment 47

•

24 years ago

*spam* m0.9

Keywords: mozilla0.9

kinmoz

Updated

•

24 years ago

Target Milestone: Future → mozilla0.9

kinmoz

Comment 48

•

24 years ago

FYI, when I added the replace feature, I made it so that the find code doesn't iterate backwards to figure out where in the doc it is, if the wrap around checkbox is unchecked. This makes searching much quicker. I will look into fixing the wrap around case.

R.K.Aa.

Reporter

Comment 49

•

24 years ago

Goodies. Because a major problem with the search feature is that i have to CLICK in a page before search even knows where to search. If i don't give a web-page focus with a click before a search, search will find nothing, regardless of whether wrap is checked or not. The problem with this is that search after a "click in a page" will "sense" that click as an insertion-point of sorts, and starts searching FROM there and onwards - missing all previous occurances of a word.

kinmoz

Comment 50

•

24 years ago

The focus problem you describe is a front end (nsFindComponent) problem, which belongs to law@netscape.com. Perhaps you should file a bug/rfe on that? Note there are some intersting issues related to just doing a find automatically in the browser content area ... for example who do you give it to if the content area contains multiple frame sets?

R.K.Aa.

Reporter

Comment 51

•

24 years ago

How about "if focus not indicated, search them all" ? And then let focus be indicated by a left-click in a frame, or by which frame user bring up context-menu to "search in frame".

Alec Flett

Comment 52

•

24 years ago

this is a performance-oriented bugs - please file a seperate bug for focus issues

Heikki Toivonen (remove -bugzilla when emailing directly)

Updated

•

24 years ago

Keywords: nsdogfood

kinmoz

Comment 53

•

24 years ago

I'll try to get to this next milestone. (Mozilla0.9.1)

Target Milestone: mozilla0.9 → mozilla0.9.1

kinmoz

Updated

•

24 years ago

Target Milestone: mozilla0.9.1 → mozilla0.9.2

rubydoo123

Updated

•

24 years ago

Whiteboard: [perf]

rubydoo123

Comment 54

•

24 years ago

moving to 1.0 for perf work

Target Milestone: mozilla0.9.2 → mozilla1.0

Bradley Baetz (:bbaetz)

Comment 55

•

24 years ago

(I thought I mentioned this earlier) As dbaron mentioned, nsContentIterator needs to be rewritten, I suspect. This is really obvious on stuff like lxr, because we have a very wide and flat content tree. I tried a one element cache, but that didn't help, because the links are two levels deep (the <a> + the text node). We need to keep a stack of {offset, ptr} pairs, and revalidate (in O(1), from the nsVoidArray), to deal with dynamic content. Does that sound about right?

Joe Francis

Assignee

Comment 56

•

24 years ago

we dont have to do anything for dynamic content, because the iterator is documented not to be safe to use in the face of dom changes while iterating. If we need to speed up the iterator, it can be done similar to the way the subtree iterator was improved (and similar to the way that copy a range was done in nsDocumentEncoder). But it's the statelessness of Find that is making it slow. It could be quick to find adjacent matches with the current iterator.

Bradley Baetz (:bbaetz)

Comment 57

•

24 years ago

*** Bug 84652 has been marked as a duplicate of this bug. ***

Blake Ross

Updated

•

24 years ago

Whiteboard: [perf] → [perf][nav+perf]

Daniel Glazman (:glazou) (not active in Mozilla any more)

Comment 58

•

24 years ago

I made a test... Trying to look for "Rosen" in the LXR view of nsCSSFrameConstructor.cpp, I left the machine running. The string is on the 21th line or so. What a hard thing to find.... It took 17 seconds to find it ! And I think I know why (suspense :-) The LXR view of the source is mostly made of a ***13476*** (2001-06-06) lines PRE element. |nsFindAndReplace::DoFind | calls |nsTextServicesDocument::GetCurrentTextBlock|, itself calling |nsTextServicesDocument::CreateOffsetTable|. And that last method just seem to build a string made of the agregation of ALL the text nodes in the PRE element !!!! We should only agregate the text nodes until the total length is for the first time >= length of the string we look for. If the string we look for is not found in that agregation, remove the first text node in this agregation and add the next text node in traversal order. Of course, generated nodes are excluded and the algo should reset the agregation on block-level elements and replaced elements limits. That should reduce the time spent in finding "Rosen" in nsCSSFrameConstructor.cpp from 17 seconds to something quite neglectable.

rubydoo123

Comment 59

•

24 years ago

excellent debugging

Hixie (not reading bugmail)

Comment 60

•

24 years ago

We *should* search in generated content, since the user doesn't know whether it is real content or not.

Boris Zbarsky [:bzbarsky]

Updated

•

24 years ago

Blocks: 93204