Closed Bug 97157 Opened 23 years ago Closed 23 years ago

Find in page with a large <pre> is very very slow


(SeaMonkey :: UI Design, defect, P3)



(Not tracked)



(Reporter: jesup, Assigned: akkzilla)




(Keywords: perf)


(3 files, 2 obsolete files)

This is a companion to bug 31770 and to bug 95461.

Try to do a Find in the lxr page above....
Severity: normal → major
Keywords: mozilla1.0
Keywords: perf
Blocks: 91351
Cc sfraser.
Priority: -- → P3
Target Milestone: --- → mozilla0.9.5
Posting a URL for reference:

Right on mozilla org ;-)  

On an older machine (PII-233 mhz) on Linux 2001083016 this page causes the
browser to black out the windows and become unresponsive for several minutes. At
several intervals the page renders and appears for a few seconds intermittently. 
0.9.6 for now.
Target Milestone: mozilla0.9.5 → mozilla0.9.6
This works for me now that the fix for 31770 was checked in.
This bug is still there; Kin is planning on a rewrite of TextServicesDocument to
solve it.  On an lxr of nsCSSFrameConstructor.cpp (monster file), Finding the
second occurance of "HTMLAttribute" (after finding the first) takes ~20 seconds
on my build.  (The first occurance is on line 20 or so, the second is on line
1200ish.  Finding the first takes about as long as the second.)  The fix for
31770 will have made this faster than it was, perhaps, but it's still REALLY
slow on large <pre>s.
Huh.  It only takes two seconds to go from the first instance, at line 32, to
the second instance, at line 1256.  I'm using 2001092403 on Win2K, on a PII-500.
 Considering this is a 12000-line file, I find this speed to be acceptable.  Not
ideal, a vast improvement.  If there's more work coming to speed this up even
more then great, I guess.
Blocks: 106961
--> Mozilla0.9.9

Trying to load-balance bug list across milestones. I will pull this bug in to a 
sooner milestone if I can.
Target Milestone: mozilla0.9.6 → mozilla0.9.9
On Kin's request, I'm looking into this.
Assignee: kin → akkana
Target Milestone: mozilla0.9.9 → mozilla0.9.8
It's time to start soliciting comments.  It seems to be working fairly well for
my small tests so far.  In a debug build, my worst-case test (search to the end
of the lxr page for nsCSSFrameConstructor.cpp), it takes 1/4 the time of the old
find (3 seconds vs. 12 on an Athlon 1600); for incremental finds the gain is
substantially larger.

As is, this replaces the find used by the browser.  XUL/JS code to make the
editor use the new find will be showing up momentarily in a separate bug.

Comment about the implementation, which reviewers need to know:
First, the code I used for IsBlock gets that information from the parser
(technically the "right" owner of this information).  However, to do this
introduces a dependency from embedding/components/find to parser and necko. 
This may not be kosher.  It's easy enough to do what everybody else does and
just duplicate the list of nodes thought to be block, and maintain it ourselves;
this is ugly but introduces no dependencies, and I can easily rewrite it that
way if the dependencies are a problem.

All the TEXT_SVCS_TEST stuff in nsWebBrowserFind is code I'm using for
performance testing -- I can switch back and forth between old and new find by
setting an environment variable (the timing stuff will probably only work on
Unix).  My plan is to remove it all before checking in, but until then I want to
leave it in to be able to get easy performance comparison numbers.

Other limitations/issues as noted in comments in the code.

Simon, I know it's a lot of code, but do you have any cycles to glance at this?
 Even if you don't have time for a code review, I'd appreciate comments on the
embedding dependency issue and any other concerns you might have; if you do have
time to look at the code it would be much appreciated.

Of course, anyone else on the cc list is also welcome to review or offer comments.
I will be glad to review this when I get back in early January.
Note: the string APIs have changed again, so line 357 of nsFind must change to read:
I'll assume I don't have to make a new patch just for that.
Still looking for reviews ...  This needs to go in by next Wednesday if it's
going to make the milestone.  Please?
How final is the patch? I'll review when you say it's ready to go in  ;)
Thanks!  As far as I know, it's ready (modulo the #ifdef'ed testing code which
should probably be removed).  There may be issues I've missed, though, hence the
plea for reviews to get more sets of eyes on it.
I'll try to look at it also this weekend
Here's another patch which I think combines Kin's changes and mine.
I've included the mods to the macbuild xml file, and Kin's changes to make it
build on windows.
Kin says it's not working right on Windows.  Unfortunately my build isn't
working today (for possibly unrelated reasons) so we haven't gotten to the
bottom of this yet.  Reviews are ongoing and I do still need reviews, and would
love to have an indication of whether these XML files do the right thing on the
mac build.
Attachment #62247 - Attachment is obsolete: true
I want to see this in, if at all possible, for 0.9.8.

--   Use nsnull instead of 0 in things like "mRange = 0" (for any pointer,
especially an nsCOMPtr).

--   nsFind::SetRange() doesn't create an iterator if it wasn't already created
(could crash).  Since you have code elsewhere for creating the iterator, you may
want to break that into a private method.

--   Minor: don't we have simpler ways of defining straightforward attribute

--   In AddIterNode(), you drop whitespace-only nodes.  Won't this cause
problems if you have something like this: <b>foo</b> <i>bar</i> and search for
"foo bar"?  Similarly for cases of searching for multiple whitespaces where
something breaks the whitespace into a separate node.

+    // nsIContentIterator.h says Next() will return error at end,
+    // but it doesn't really, so we have to check:
-- You should patch the nsContentIterator.h file if it lies

-- There are a couple of tabs; kill them

Other than that, looks quite sane.  Correct those, and  We
should get r/sr's from Kin and JFrancis too I think.

Blocks: 116405
I had some trouble with the mac project canges, so I created a new diff for the
.xml file.
Attached patch Latest patchSplinter Review
Here's a new patch, based on inputs from rjesup, kin, jfrancis.
It addresses:
- the issues rjesup brought up
- some build issues on mac and windows
- Joe's mac build xml file
- check pref "browser.new_find" (off by default) and only use the new behavior
if that's set
- don't save state between finds; instead, find from the selection position
- climb up to find nearest block parent and compare with previous one, and
clearQ if they're different (because iterator doesn't give us the parent node
in both directions)
- comment out iterator pre order since it doesn't currently work, and so don't
SkipNode in forward direction

I think this one should be safe to check in.  Joe's going to look at it after
Attachment #64913 - Attachment is obsolete: true
Looks good.  What's the answer to my paranoid worry:

--   In AddIterNode(), you drop whitespace-only nodes.  Won't this cause
problems if you have something like this: <b>foo</b> <i>bar</i> and search for
"foo bar"?  Similarly for cases of searching for multiple whitespaces where
something breaks the whitespace into a separate node.

So long as you have an answer for that (I'm probably mis-understanding the
effect fo dropping whitespace nodes),
Attachment #65181 - Flags: review+
r=jfrancis for initial landing, with preference set to disable new find code
pending further fixes. 
i meant to add to above comment that i did build testing and runtime testing on
mac for this.
A thought.  You need to find whether the _node_ is a block or whether the node's
_frame_ is a block?  Would it not make more sense to get the primary frame for
the content and check whether it's a percentage base?
Blocks: 80805
The patch (with a few last-minute tweaks for problems Joe discovered) has gone
in, preffed out by default (because it has some problems that still need to be
worked out).  To turn it on, user_pref("browser.new_find", true).  On Unix
platforms it will print out timing information, so you can do timing tests by
flipping the pref and noting the times.

Bumping the bug to the next milestone for getting it turned on, getting all the
issues worked out and removing the old code and the old files.
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Blocks: 95461
Some problems I'm seeing with new find turned on:

1. It seems to only work with the page that exists when you *first*
   bring up the find dialog ... if you visit another page and leave
   the find dialog up, it stops working, probably due to the fact that
   nsWebBrowserFind::SearchInFrame() doesn't reset the doc used in mFind.

2. Searching for a string like "read Mozilla" on finds
   the sub string in:

      For more info about us, read <a href="foo">Mozilla at a Glance</a>

   but only the word "read" is hilited.

3. If I load the lxr page for nsCSSFrameConstructor.cpp, and search for
   "pseudo", it seems to find the first occurrence, but subsequent finds do
   nothing ... no not found dialog, no hiliting or scrolling of the next
   "pseudo" word. The first "pseudo" word remains hilited. This is probably
   because find is not starting it's search after the end of the current
   selection when going forward. (Note it should start before the current
   selection when searching backwards too.)

Just jotting down some of the notes I mentinoed to akk when looking at the first
pass impl of the nsIFind interface ...

- I really think we should remove dependencies on doc, presshell, and selection
  from this low-level interface, it would allow us to use the implementation
  on different types of DOM based trees like the ones used in text and other
  types of widgets. It would also allow things like spellcheckers to find all
  occurrences of a specific word, without disturbing the current selection.
  It would also open up a bunch of possibilities for using find via JS for
  things that we haven't thought of yet.

- We should remove wrapping from nsIFind and let the app/web-browser-find
  handle it, or provide an app-level convenience interface that implemented
  it. I say this because some apps want more control or notification before
  a search actually wraps within a particular doc. The example that comes to
  mind would be searching through a multi-frame set of docs ... some apps may
  prefer to search from a certain point within a one of docs in a frame, let
  the find proceed through all the frames after it, then wrap to the top frame,
  continuing till you hit the doc we started in. 

- The find method should probably take 2 ranges, one that says only search
  between these 2 endpoints, and the other an indicator of where to start the
  search from.

- Lose the references to replace and let the app handle it.
Randell: your paranoid worry is probably correct.  In fact, that attempt at
optimization is probably not safe and should be removed.  I'll do that in the
next iteration while tracking down the other issues that have come up.

Re the document not being changed: nsWebBrowserFind has code that supposedly
changes the document if the frame isn't the same as before ... but it doesn't
work (that != never actually fires).  I wonder if that's what Simon was trying
to debug when he put the big #ifdef DEBUG_smfr right near there?  The old code
works because it resets the document for each find anyway; I hope to avoid that
and only reset when the document has actually changed (which I can probably do
by storing a weak pointer to the document in nsWebBrowserFind if storing the
frame doesn't do the trick).

Agree with Kin that we should offer a low-level interface that doesn't require
knowing about the selection, pres shell, etc.  I'll be doing that in the next
iteration (there wasn't time last night).  I also agree about removing the
wrapping from the low-level interface.  The tentative plan is that
nsIWebBrowserFind will be the easy-to-use interface that takes docshells and
handles wrapping, and nsIFind will be the low-level interface that just returns
a range.
Boris: I've been warned that GetPrimaryFrameFor is very slow and should be
avoided when possible.  But of course, one should always test assertions like
that rather than believing them blindly, so it may be worth trying.
GetPrimaryFrameFor can be slow, yes.... Especially on table elements.

My concern is what happens if the page contains something like 

<span style="display:block">
  Some random
<span style="display:block">
  text in here

Will Find end up finding "random text" in such a page?  :)
just documenting some old news so we don't forget: for a good time, search for "a page".  The behavior here may shed some light on
difficulties with the find algorithm.
Could this in some way be releated to a bug i reported here:

Some of the behaviour between this and my bug is identical. The high CPU usage,
eating up a _lot_ of ram. 

Also after visiting the page that was given as test case for this bug, i also
noticed the same behaviour as with my bug. Memory is not released, lots of CPU
cycles spend on seemingly nothing (nothing visual).

Also when going to the test page, back here, and to the test page again, keeps
increasing the memory usage of the browser.

As i wrtite this, mozilla is using 140Mb of memory, and i've only visited 2
pages plus twice the test page here, plus once the test page i included in my
bug report. 
mass moving open bugs pertaining to find in page/frame to as
qa contact.

to find all bugspam pertaining to this, set your search string to
QA Contact: sairuh → pmac
Blocks: 122061
The problem with the mozilla start page was a problem with the way the algorithm
skipped multiple adjacent whitespace.  When we hit the first non-space after a
block of whitespace, we weren't advancing to the next non-whitespace character
in the pattern string.

While I was there, I noticed a possible buffer over/underrun in the "search to
the  last space" code, so I also fixed that.

Can I get a review for that?  I'll work separately on the problems moving
between pages -- I need to restructure the high level/low level code as Kin and
I discussed.
For anyone interested: issues with the API splitting will be addressed in newly
opened bug 123087 (this bug was getting cumbersome and it was hard to keep track
of what needed to be done).
Keywords: nsbeta1+
The API split is in.  This bug is now tracking turning the feature on by default.
It's on by default!
Closed: 23 years ago
Resolution: --- → FIXED
Clarification: it's on by default in the browser. Working on the editor
find/replace ... bug 80805.
is there another bug on ripping out the old find code then?
Just filed bug 126312 -- thanks for the reminder.
Akkana, is this one really fixed? 
1. Click on the url that lists on comment#2,
it still hangs. (windows 2000, commercial build: 2002-02-22-10-trunk).

2. Click on the orginal url:
 Then, quickly, hit ^g for several times in "find in this page" dialog with this
text: nsobject, it crashes. This probably is a different issue so I already 
filed it:

Let me know. Thanks!

Patty: can you check your prefs (prefs.js and user.js) and make sure you don't
have "browser.new_find" set to false somewhere?

With a debug build from today, nsCSSFrameCnstructor.cppI get:
new find, first nsobject: 0.1 sec
new find, second (not found): 1.3 sec
old find, first nsobject: 5.8 sec
old find, second (not found): 11.1 sec

I don't see the crash, but I'm not sure I'm trying the right way to reproduce it
-- accel-G doesn't seem to do anything for me, either in the find dialog or in
the browser window when the find dialog is up, so I just tried hitting return in
the find dialog several times.  Any chance you can get a stack trace for the
crash?  Does it happen repeatably when new_find is true (the default now) but
not when it's false, or does it happen either way?  Does it happen on earlier
Thanks Akkana! Verified on windows (commercial netscape build: 2002-02-27-06-trunk)
new find, first nsobject: 0.01 sec
new find, second (not found): 0.09 sec

Product: Core → Mozilla Application Suite
Component: XP Apps: GUI Features → UI Design
You need to log in before you can comment on or make changes to this bug.