Last Comment Bug 14871 - [Find] Find whole word only
: [Find] Find whole word only
Status: RESOLVED DUPLICATE of bug 269442
: helpwanted
Product: Firefox
Classification: Client Software
Component: General (show other bugs)
: Trunk
: All All
-- minor with 40 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
Depends on:
Blocks: 481513 106961 269442 354618
  Show dependency treegraph
Reported: 1999-09-24 14:49 PDT by Michael Lowe
Modified: 2017-01-28 11:55 PST (History)
50 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

First proposed patch (9.05 KB, patch)
2003-06-27 20:30 PDT, Ben Howe
no flags Details | Diff | Splinter Review
Removes interface changes required in 1st patch (7.99 KB, patch)
2003-06-29 22:42 PDT, Ben Howe
no flags Details | Diff | Splinter Review
Uses nsIWordBreaker (with hack for  ) (8.18 KB, patch)
2003-07-01 15:36 PDT, Ben Howe
no flags Details | Diff | Splinter Review
Updated patch (still slightly buggy) (10.09 KB, patch)
2003-07-01 22:30 PDT, Ben Howe
no flags Details | Diff | Splinter Review
Fully functional (I think) (14.94 KB, patch)
2003-07-07 20:59 PDT, Ben Howe
no flags Details | Diff | Splinter Review
Code uses AdvanceNode now. (19.29 KB, patch)
2003-07-14 18:22 PDT, Ben Howe
no flags Details | Diff | Splinter Review
Fully functional (I think) - 1 year later (18.57 KB, patch)
2004-08-13 08:51 PDT, Ben Howe
akkzilla: review+
rbs: superreview-
Details | Diff | Splinter Review

Description User image Michael Lowe 1999-09-24 14:49:34 PDT
Please add "Match Whole Word Only" option to browser's "Find on this page"
Comment 1 User image Mats Palmgren (:mats) 1999-09-28 12:24:59 PDT
*** Bug 15055 has been marked as a duplicate of this bug. ***
Comment 2 User image Simon Fraser 1999-09-28 15:15:59 PDT
I've wanted this too.
Comment 3 User image leger 1999-12-08 13:41:59 PST
Updating QA Contact.
Comment 4 User image rubydoo123 2000-06-06 13:46:14 PDT
moving to future milestone
Comment 5 User image rubydoo123 2000-06-06 16:20:42 PDT
moving back to previous owner
Comment 6 User image rubydoo123 2000-07-27 13:07:05 PDT
adding help wanted keyword
Comment 7 User image Tim Powell 2001-06-20 11:17:14 PDT
BTW, this is supported by IE5, but wasn't in Netscape 4. It's helpful to have.
Comment 8 User image Arthit Suriyawongkul 2001-09-26 20:36:57 PDT
if i select the "match whole word only" option
and search for a word "file",
it will not shows me "profile"

BUT .. will it shows me "files" ?
Comment 9 User image kinmoz 2001-10-26 11:23:45 PDT
*** Bug 104046 has been marked as a duplicate of this bug. ***
Comment 10 User image Simon Fraser 2001-12-21 18:11:23 PST
A find RFE for you!
Comment 11 User image Tim Powell 2002-01-04 10:11:25 PST
*** Bug 118118 has been marked as a duplicate of this bug. ***
Comment 12 User image sairuh (rarely reading bugmail) 2002-01-24 15:25:06 PST
mass moving open bugs pertaining to find in page/frame to as
qa contact.

to find all bugspam pertaining to this, set your search string to
Comment 13 User image Michael Lefevre 2002-06-28 18:35:45 PDT
*** Bug 143180 has been marked as a duplicate of this bug. ***
Comment 14 User image Alex Bishop 2002-07-09 22:32:29 PDT
*** Bug 156517 has been marked as a duplicate of this bug. ***
Comment 15 User image TGOS 2002-07-10 01:47:35 PDT
Reply to #8:

No, searching for whole words only is like the following regular expression search:

[\n\r\t ]word[\n\r\t ]

'\n' is a linefeed, '\r' is a carriage return, '\t' is a tab sign and the last
sign is a space (not sure if '\s' for space is correct regex syntax, some regex
parsers support it, but not all; however I think a space is a space, isn't it?

So running a whole word search for "file" will only come up "file" (or "File" if
match upper/lower case isn't enabled). To make complicated searches, the search
would have to get an own syntax and that is too complicated for most users that
just want a simple search. But being able to limit searches to whole words only
can often be a big advantage and is one of the things that makes IE more
comfortable at times than Mozilla.
Comment 16 User image Akkana Peck 2003-02-12 18:01:02 PST
*** Bug 192934 has been marked as a duplicate of this bug. ***
Comment 17 User image James "Kovu" Russell 2003-03-10 20:17:58 PST
Moving from "help wanted 1.0", which obviously is not going to happen now. This
feature would be nice for 1.4a, I think. It'd be nice if we could match
functionality with Office, and Office (Word at least) allows you to search for
whole words only.
Comment 18 User image R.K.Aa. 2003-03-24 02:11:54 PST
*** Bug 198922 has been marked as a duplicate of this bug. ***
Comment 19 User image Christophe Jolif 2003-05-02 08:35:31 PDT
I would need that functionality available in nsFind.Find() method.
Comment 20 User image TGOS 2003-05-02 08:47:52 PDT
Okay, who do we have to talk to to get this into nsFind.Find()? Do we have to
code it ourselves? Will it be difficulty to code for a non expert C programmer?
Maybe I will take a look, after all I'm currently spending most time on C
programming anyway (but let's face it: I'm a newbie C programmer; pretty
experienced in Java, PHP and some other languages, but only basic C skills).
Comment 21 User image Akkana Peck 2003-05-02 13:11:11 PDT
The difficulty will be in the algorithm, not in knowing tricky C language
features.  The changes should all happen within one routine, after the fragment
has already been fetched from the DOM; if you grok "char*" and "char" the rest
probably won't be a problem.  A good programmer experienced with Java and PHP
should be able to pick up the C++ that's needed.  

Basically, I think it will come down to this: When looking to start a new word
match, don't start matching unless we've just started a word.  (It really should
use the nsIWordBreaker interface so it will work with non-Latin charsets, but
I'm not sure how to do that on the fragments of text that come out of the DOM. 
I suppose just flagging when the last char seen was whitespace is a reasonable
first start.)

It may be necessary to pay attention to how word-mode search interacts with
spaces in the search string (I'm not sure whether this will be a problem or not).

Alternately, another option is if we're in word mode, don't use the current code
at all, and instead, jump forward in the DOM itself using the nsIWordBreaker
interfaces, then compare (possibly using code factored out of the existing
interface).  That would work in all languages, but would be a lot harder
(especially for someone not comfortable with C++ or XPCOM calling conventions).
Comment 22 User image Akkana Peck 2003-05-16 11:27:38 PDT
kovu401: Don't change target milestones on other people's bugs, unless you're
willing to take them and work on the fix yourself.
Comment 23 User image Ben Howe 2003-06-27 11:39:37 PDT
Is anybody currently working on this bug?  After looking at some of the code, it
looks like a lot of the framework is already there.  I have added the rest of
the code necessary (thought it is not perfect I'm sure).  Should I work on
perfecting the code and create a patch, or would I be stepping on anybody's toes?
Comment 24 User image Akkana Peck 2003-06-27 12:56:16 PDT
I'm not aware of anyone else working on this right now; you wouldn't be stepping
on anyone's toes (unless someone else speaks up quickly).  Go for it!  I'll be
happy to review a patch if you write one.
Comment 25 User image Ben Howe 2003-06-27 20:30:01 PDT
Created attachment 126665 [details] [diff] [review]
First proposed patch

This patch does the job for me, but I don't really know how it will work with
non-Latin charsets.  I am unfamiliar with nsIWordBreaker and how to effectively
use it, so my patch basically works by requiring the letter before the match
and the letter after the match to be NON-alphanumeric.	This way, you can do a
'whole word search' for 2 words, like "aWord bWord".  Check it out and report
back any problems or suggestions.

I'm a complete newbie to Mozilla development, so go easy on me. :)
Comment 26 User image Ben Howe 2003-06-29 22:42:59 PDT
Created attachment 126727 [details] [diff] [review]
Removes interface changes required in 1st patch

OK, so I figured out enough about XPCOM to create and use a word breaker for
flagging purposes, and I removed the additional member variables from
nsIFind.idl and nsFind.h, but I still don't use the word breaker functionality
to determine where word breaks occur.  This patch still works for english, but
correctly fixing this bug (in an international sense) would require 1 of 2

1. As mentioned, write a separate find routine in nsFind.cpp that would try to
match word by word, or
2. Make an enhancement to nsIWordBreaker: add a function to determine if a
character is a "breaking" character.

Option #2 is my preferred path, but I don't know difficult it is to convince
people to change interfaces.
Comment 27 User image Ben Howe 2003-07-01 15:36:13 PDT
Created attachment 126858 [details] [diff] [review]
Uses nsIWordBreaker (with hack for  )

This patch actually uses nsIWordBreaker::BreakInBetween to detect word breaks. 
I used a hack to convert nbsp's to ' ' before detecting breaks.  The basic idea
was to not start matching until some sort of word break occurred, and don't
finish a match unless there was a word break char after the last character.

On Google's page, it catches all the words except "Groups" (not real sure why).
 Any other testing would be appreciated.
Comment 28 User image Akkana Peck 2003-07-01 18:04:59 PDT
Great stuff.  Clean and readable, works nicely, and doesn't break any of the
find regression tests.  (Which, incidentally, live at
if anyone's interested.  Eventually we should update them to add a test for

Some minor comments on the code (the third patch):

#ifdef XP_MAC
#define IS_NBSP_CHAR(c) [ ... ]

I'm unhappy we have to do this, and have split off bug 211343 on the issue (it
shouldn't block this bug).

+    PRUnichar nextc = (t2b ? t2b[findex+incr] : CHAR_TO_UNICHAR(t1b[findex+incr]));
+    PRUnichar prevc = (t2b ? t2b[findex-incr] : CHAR_TO_UNICHAR(t1b[findex-incr]));

Someone (maybe me) should probably do timing tests on a big page to make sure
this doesn't noticably slow things down (since speed was a big consideration in
the find design).  I don't think it will be noticable, and this is nice and
clean and readable, so I don't want to suggest any premature optimization if it
isn't needed.

+  , mIterOffset(0)
+  , mWordBreaker(0)

Please swap these two lines.  Picky gcc gives a warning if the order of
initialization doesn't match the order of declaration.

+ PRBool wordBreakRight;

It's only "right" when we're searching forward, correct?  (Assuming L2r text.) 
It's really wordBreakNext?  And likewise for wordBreakLeft?  Or am I confused there?

+             if (IS_NBSP_CHAR(nextc))
+               nextc = CHAR_TO_UNICHAR(' ');

Does this do any work in the conversion?  If so, maybe we should keep a local
variable for a unicode space.

I wonder if it's worth checking for word break right before making the range? 
If we're in whole-word mode we fail that test, we're going to throw away the
range, so maybe we should do the test a little earlier, right after "found a
match!" and right before we make the range, so we don't have to make it then
throw it away later.  Or is there a reason we need the range first?
Comment 29 User image Simon Fraser 2003-07-01 18:56:11 PDT
+#ifdef XP_MAC
+#define IS_NBSP_CHAR(c) (((unsigned char)0xca)==(c))
+#define IS_NBSP_CHAR(c) (((unsigned char)0xa0)==(c))

This looks like it was copied from the spellcheck code, but, as far as I know,
NBSPs are not different on Mac. See, for example,
Comment 30 User image Kathleen Brade 2003-07-01 19:23:48 PDT
I agree with Simon; that #ifdef seems bogus for use within gecko.  I'm not 100%
sure on the usage within extensions/spellcheck but it isn't right for Find.
Comment 31 User image Ben Howe 2003-07-01 20:02:17 PDT
> +  , mIterOffset(0)
> +  , mWordBreaker(0)
> Please swap these two lines.  Picky gcc gives a warning if the order of
> initialization doesn't match the order of declaration.

OK - done.  mWordBreaker wasn't initialized before, and maybe it doesn't really
need to be since it is an nsCOMPtr.  I assume that nsCOMPtr's constructor sets
the pointer to null, but I figured it couldn't hurt to be more clear.

> + PRBool wordBreakRight;
> It's only "right" when we're searching forward, correct?  (Assuming L2r text.) 
> It's really wordBreakNext?  And likewise for wordBreakLeft?  Or am I confused
> there?

Full ACK.  Changed in my copy.

> +             if (IS_NBSP_CHAR(nextc))
> +               nextc = CHAR_TO_UNICHAR(' ');
> Does this do any work in the conversion?  If so, maybe we should keep a local
> variable for a unicode space.

CHAR_TO_UNICHAR is just a type cast.  Also, I am not sure if the XP_MAC check is
necessary.  I just copied it out of the spellcheck code, but I did just notice
the #define NBSP_CHARCODE in nsFind.cpp, line 582 - it just uses char code 160

This issue would not be an issue if nsSampleWordBreaker::GetClass correctly
handled the nbsp (160).  It is identified as kWbClassAlphaLetter in line 105 of
nsSampleWordBreaker.cpp in a catch-all else branch.

As for the placement of the last word break check, I see no reason that it
couldn't be moved up before the range check.  I will try changing it and
incorporate it into my next patch if it works.
Comment 32 User image Ben Howe 2003-07-01 22:30:29 PDT
Created attachment 126888 [details] [diff] [review]
Updated patch (still slightly buggy)

Incorporates changes in comment #28, but when doing whole word searches for
"mozilla" in the regression testing page produces odd results.	It catches most
of the mozillas, but not all.  Here's the weird thing: it misses different ones
each time I close and re-open Mozilla.	Feedback from others?
Comment 33 User image Kathleen Brade 2003-07-02 05:39:46 PDT
>This issue would not be an issue if nsSampleWordBreaker::GetClass correctly
>handled the nbsp (160).  It is identified as kWbClassAlphaLetter in line 105 of
>nsSampleWordBreaker.cpp in a catch-all else branch.

I disagree if you are suggesting that the nbsp be treated like a space or word
breaker.  The whole point of a non-breaking space is for it to *not* break
words; it should be treated as a letter.

If I have this text:  This is&nsbp;a test
It will look like:    This is a test
But "is a" should be treated as one word.
Comment 34 User image Ben Howe 2003-07-02 06:26:04 PDT
The non-breaking part of nbsp refers to not breaking a line between words. supports this definition.  
Comment 35 User image Simon Fraser 2003-07-02 09:31:25 PDT
Yeah, I don't think the user expects   to behave like a non-space letter
for finding. As far at the user is concerned, it should behave like a space.
Comment 36 User image Aaron Leventhal 2003-07-07 03:27:57 PDT
When this moves forward, we can experiment with whether it would be a good
default for typeaheadfind/ find as you type.

In usability studies it was found that people expected find as you type it to
match only at the beginnings of words.
Comment 37 User image Ben Howe 2003-07-07 20:59:59 PDT
Created attachment 127220 [details] [diff] [review]
Fully functional (I think)

The problems with the previous patch lie in setting nextc and prevc from the
t1b and t2b arrays.  If c is at the end of the node, the nextc and/or prevc
will be outside the array boundaries.

It was easy enough to fix this for prevc: I just set prevc = c before setting c
in each iteration.  But for nextc, it is a bit more tricky.  If c is the last
character in a node, then nextc must be retrieved from the next visible node -
I used NextNode for this.

NextNode changes the member variables, so I wrote a new function that saves the
member variables, gets the next char from the next node, and then restores the
member variables.  I don't know if this the best way to implement this, so I am
open for comments.  I think the code is fully functional - I have tested it
quite a bit, but more testing is, of course, appreciated.
Comment 38 User image Akkana Peck 2003-07-09 15:38:31 PDT
Nice.  I don't have any problem with the new function -- it seems like a
reasonable approach, and makes the code easy to read.  I'm not as happy, though,
about it using the member variable mIterNode to iterate, knowing that it's going
to be restored to its previous value.  I wonder if we might be better off
rewriting NextNode to return the new node and offset, and its other customers to
do NextNode(mIterNode, &mOffset) if that's what they want; then
GetCharFromNextNode can do its work without having to save/restore mIterNode and
friends.  That would also make GetCharFromNextNode shorter and simpler.

Can the rest of GetCharFromNextNode be factored?  I guess it would take a fair
amount of rewriting of the main Find loop to group all that into one place, so
that probably isn't worth the effort or risk.  Oh, well.

Very minor question: I found myself wondering about the edge cases: what happens
if the word happens to match at the end of the document, and there's no next
node?  It seems to do the right thing in the few cases I tested (probably it's
returning additional nodes which have no text in them, so it's satisfying the
word break condition that way).
Comment 39 User image Ben Howe 2003-07-09 16:22:32 PDT
Yes, GetCharFromNextNode will return 0 if there isn't any more nodes, and 0 will
satisfy word break in all reasonable conditions.

> I wonder if we might be better off
> rewriting NextNode to return the new node and offset, and its other customers to
> do NextNode(mIterNode, &mOffset) if that's what they want; then
> GetCharFromNextNode can do its work without having to save/restore mIterNode and
> friends.  That would also make GetCharFromNextNode shorter and simpler.

I agree.  Maybe we should create a new function called AdvanceNode and use it in
all the places that NextNode is used.  Change NextNode so that it returns
iterNode and offset and then AdvanceNode would just call NextNode and set the
member variables with NextNode's returns.  This would basically turn AdvanceNode
into a wrapper that updates member variables, and "Advance" clearly indicates in
its name that it is changing the object's state.  GetCharFromNextNode would
still call NextNode, but their wouldn't be any need to save the iterator and offset.
Comment 40 User image Akkana Peck 2003-07-11 16:18:23 PDT
That sounds like a good plan ... should cause minimal disruption to the rest of
the code.
Comment 41 User image Ben Howe 2003-07-14 18:22:58 PDT
Created attachment 127767 [details] [diff] [review]
Code uses AdvanceNode now.

Check my usage of nsCOMPtr's in NextNode, AdvanceNode, and GetCharFromNextNode.
 If there are no incorrect usages, then maybe this is the final patch.
Comment 42 User image Akkana Peck 2003-07-29 15:16:55 PDT
Sorry I've been delaying on this.  I've looked at it, and I keep getting partway
through it and thinking that having the extra routine ends up being more
complicated than the last patch since we end up having to save and restore
mIterNode sometimes anyway, and maybe we were better off with the last one.

Re the use of nsCOMPtr: generally we try to avoid using nsCOMPtr as function
arguments, at least in editor code; it adds a level of complexity, and IIRC it
sometimes adds extra addrefs that don't need to be there.  Generally it's better
for the called method to just take a pointer (e.g. nsIContentIterator**
aIterator), and be explicit (in a comment in the .h file, preferably) about
whether or not it addrefs its argument before returning, and then the caller
uses an nsCOMPtr<nsIContentIterator> and calls it with getter_addRefs() or
getter_doesntAddRef() as appropriate.  (Usually, doing the addref is best.)

So anyway, I'm halfway through looking at the new code and modifying it to do
this, but I keep thinking we might be better off just going back to the previous
simpler patch since this doesn't really avoid the pushing/popping anyway.
Comment 43 User image Miguel Vargas 2003-11-14 13:00:18 PST
Ben Howe did some really good work writing this feature, but it hasn't been
added to the tree.  The last comment from Akkana says that we should probably go
back to the previous patch (the one without AdvanceNow).

Could someone review the 7/07/03 patch so we can add this feature?  The work has
already been done, it's a shame that we haven't been able to get this in.
Comment 44 User image Bill Mason 2004-05-13 06:14:24 PDT
*** Bug 243500 has been marked as a duplicate of this bug. ***
Comment 45 User image Ted Mielczarek [:ted.mielczarek] 2004-07-20 05:45:11 PDT
*** Bug 252262 has been marked as a duplicate of this bug. ***
Comment 46 User image Ben Howe 2004-08-12 15:42:46 PDT
What would it take to get one of these patches reviewed/checked in?  Even
though, it's not a frequenty requested feature, I'd hate to see this go to waste.
Comment 47 User image Akkana Peck 2004-08-12 17:53:46 PDT
Yes, you're right.  I'm sorry.  I was sort of waiting for a comment expressing a
preference for one patch or the other, and then it fell off my radar, and I
should have just picked one and reviewed it.  I'm going to pick the previous one
("Fully functional (I think)") unless someone thinks there's something in the
newer one which is better.

I may change function arguments that use nsCOMPtr.

Unfortunately, the CVS change from pserver to ssh has made it impossible for me
to cvs update or do a cvs diff, and bugzilla's diff doesn't show the last patch.
 I've sent in my key for cvs, and will meanwhile get started looking at the
diffs to refresh my memory.  They looked pretty good the first time, so I expect
we should be able to get them in (if I want it done some other way, then I
should dang well write that part. :-)
Comment 48 User image timeless 2004-08-12 20:27:52 PDT
akk: you can use cvs -d in
the interim.

echo '' > /tmp/MyRoot
find . -type d -name CVS -exec cp /tmp/MyRoot {}/Root;

you can also forcibly commit w/ cvs -d commit filelist

later if you decide you don't mind using cvs-mirror for your tree :).
Comment 49 User image Ben Howe 2004-08-13 08:51:26 PDT
Created attachment 156045 [details] [diff] [review]
Fully functional (I think) - 1 year later

I just pulled the trunk today and tried to reapply the patch.  Some interfaces
have changed over time, so I have updated to patch to reflect the changes that
happened since I wrote the original patch.
Comment 50 User image rbs 2004-08-13 21:47:48 PDT
Does your patch have any effect on bug 240932?
Comment 51 User image Ben Howe 2004-08-14 08:53:37 PDT
After doing a cursory test on the attached test case for bug 240932, this patch
does not seem to affect bug 240932.  I get the same results as described when
searching for x1, and when I search for the whole word 'x123', I find 4 matches
(all possible matches except for the grayed-out text box).

Here is something even more disconcerning, though.  Whey I type a string into
the 1 line textbox, such as 'abc', searching for the whole word 'x123' yeilds
only first result - 'x<p></p>123'.  It does not find the others.

Changing the form input at all from its default values actually seems to break
the "find whole word only" option, but this does not happen on the Regression
Testing Find In Page.  I don't know what would be causing this.  Maybe the fact
that in the test case for bug 240932, the input items are not in <form> tags
might have something to do with this.
Comment 52 User image Akkana Peck 2004-08-15 22:32:41 PDT
Comment on attachment 156045 [details] [diff] [review]
Fully functional (I think) - 1 year later


I agree that bug 240932 has some strange behavior, and also that finding in
text controls occasionally does odd things; but I couldn't make it generate any
behavior with this patch that was worse than the behavior before the patch
(even in a text field), so I don't see any reason to block this much-delayed
patch any further.

I have some very minor changes in the version that's in my tree: changed
"varaibles" to "variables" in two places, shortened a few lines >80 columns
(including some which were there before your patch), and added the following
comment at the beginning of GetCharFromNextNode because I know I'll forget why
it needs to be there (let alone other people):
// Save the iteration variables, get the next character,
// then restore the variables.
// XXX It would be nice if NextNode/PrevNode could do a "peek"
//     without changing the pointers.

Does that comment seem to describe it well enough, or can you suggest something
better to put there?

No functional changes, so I won't bother to attach a new patch unless someone
wants it.

Simon, are you still doing sr's?  You and Kin are the only SRs familiar with
the find code, but I can pass it off to someone else for sanity/overflow
checking if you don't have time.
Comment 53 User image rbs 2004-08-15 22:48:14 PDT
I have developed some knowledge of the find code (when I implemented find in
text controls) and can help with sr if sfraser or kin are unavailable.
Comment 54 User image Ben Howe 2004-08-19 14:04:51 PDT
Akkana, your comment seems to describe it well.
Comment 55 User image Akkana Peck 2004-08-30 10:08:35 PDT
Comment on attachment 156045 [details] [diff] [review]
Fully functional (I think) - 1 year later

Good point, rbs!  Can you sr this patch?
Comment 56 User image rbs 2004-09-01 00:09:28 PDT
Comment on attachment 156045 [details] [diff] [review]
Fully functional (I think) - 1 year later

sr=rbs, with the following comments:

+    PRBool wordBreakPrev;
+    if (mWordBreaker)
+    {
+      if (prevc == NBSP_CHARCODE)
+	 prevc = CHAR_TO_UNICHAR(' ');
+      mWordBreaker->BreakInBetween(&prevc, 1, &c, 1, &wordBreakPrev);
+    }
+    // Compare.  Match if we're in whitespace and c is whitespace, or if
+    // the characters match and at least one of the following is true...
+    //   a) we're not matching the entire word
+    //   b) a match has already been stored
+    //   c) the previous character is a different "class" than the
+    //      current character
+    if ( (c == patc && (!mWordBreaker || matchAnchorNode || wordBreakPrev)) ||

If you init wordBreakPrev, you can simplify the above into the
following (to avoid referring to a potentially uninitialized variable
in the test):

    PRBool wordBreakPrev = PR_FALSE;
    if (mWordBreaker)
    if ( (c == patc && (matchAnchorNode || wordBreakPrev)) ||


+	 // Check for word break (if necessary)
+	 PRBool wordBreakNext;
+	 if (mWordBreaker)
+	 }
+	 // If a word break isn't there when it needs to be, flag and reset
+	 if (mWordBreaker && !wordBreakNext)
+	   startParent = nsnull;
+	 else

Similar as above. The |if (mWordBreaker && !wordBreakNext)| makes me itchy. I
don't like 
to see a potentially uninitialized variable being used as a RHS value. Init
wordBreakNext to false.

+    // Make and set a word breaker here if searching for whole word
+    nsCOMPtr<nsIWordBreaker> wb;
+    if (mEntireWord) {
+      wb = theDoc->GetWordBreaker();
+      NS_ENSURE_SUCCESS(rv, rv);
+      mFind->SetWordBreaker(wb);
+    } else {
+      mFind->SetWordBreaker(0);
+    }

What a mouthful code. And rv is unset in NS_ENSURE_SUCCESS. 
Just simplify the whole thing into this one-liner (with a
resilient code even if the WB is null): 

     // Set a word breaker if searching for whole word
     mFind->SetWordBreaker(mEntireWord ? theDoc->GetWordBreaker() : 0);


I am not a big fan of this function, but oh well, I can live with that. 
[I would think that one could recast the find code around a struct with 
{iterNode, offset, blockparent}, but that is another story, as your experience
with AdvanceNode has shown.]

Why do you need to save mIterator in GetCharFromNextNode?

Care also to add a comment to the section "Here begins the find code.", saying,

In entire word mode, the basic idea is to not start matching until some sort of
word break occurred, and don't
finish a match unless there was a word break char after the last character.
Comment 57 User image rbs 2004-09-01 04:45:34 PDT
Comment on attachment 156045 [details] [diff] [review]
Fully functional (I think) - 1 year later

Oops... I am retracting my sr.

It just occurred to me that there is a hole in your patch (and this may explain
why it had troubles with text controls).

When you save mIterNode, it is not enough to restore the old value. Why?
Because the underlying nsFindContentIterator has moved... and so its internal
state is not in sync with the old node. Hence things go out-of-sync if you
overwrite the new node with the old node.

What you should do with mIterator is to re-PositionAt() the (old) node where
you want.

BTW, for correctness / bullet proofing, you may have to add:

nsFindContentIterator::PositionAt(nsIContent* aCurNode)
  nsIContent* oldNode = mOuterIterator->GetCurrentNode();
+ Reset();
  nsresult rv = mOuterIterator->PositionAt(aCurNode);

It is something that I have been thinking of adding, but I have been wandering
elsewhere and hadn't come back to that yet.
Comment 58 User image rbs 2004-09-01 05:01:43 PDT
You also need to sync view-selection-source (and others such as Find-As-You-Type
that use the find service -- it is a singleton and needs special caution)

i.e., set .entireWord to false and cache/restore it. Otherwise, a user may set
something on the find dialog, another code overwrite it (via the singleton
service), and next time the user looks at the dialog, they will be surprised to
see something different. Same with FAYT which needs to set .entireWord to false. 
Comment 59 User image Will Martin 2005-09-09 00:46:47 PDT
 . . . and a full year passes in complete silence.  Not to mention we're only 15
days from the bug's six-year anniversary.  Perhaps today's bug-a-thon would be a
good day to finish this off?  I've got UI changes for the find toolbar (as
opposed to the find dialog patched above).  But I haven't written JS to make the
new checkbox function, for two reasons: 1) nsIFind doesn't have that
functionality, and 2) I don't know enough C++ to fix it myself.  But I'd really,
really be happy to have "Whole Word Only" functionality back on Firefox 1.5.

Also, I have a question - why are we modifying nsIFind to add support for word
matching when nsIFindService[1] and nsIWebBrowserFind[2] both appear to have it
already?  findUtils.js uses nsIFindService,[3] and finddialog.js uses
nsIWebBrowserFind.[4]  Do we really need three different find APIs?  Is there
some difference between them that I don't understand?  Do they operate in
different contexts/projects?  Does one perform better or something?  My poor
liddle brain can't figgur it out. Help, speling scils failing. Cortex
overheting, breach immanent. Evacuate! Evac@#%$@TR$# . . . ATH0 NO CARRIER

[1] nsIFindService:
(sorry, couldn't find any native docs for this)

[2] nsIWebBrowserFind:

[3] findUtils.js

[4] finddialog.js
Comment 60 User image Simon Fraser 2005-09-09 09:15:53 PDT
(In reply to comment #59)

> [1] nsIFindService:

nsIFindService is an internal service that simply stores global find state
between finds (i.e. the search string and settings).

> [2] nsIWebBrowserFind:

This is the "real" find interface that JS and embedders should be dealing with.
Under the hood, this is an nsWebBrowserFind which uses an nsFind to do the work.

> [3] findUtils.js

This JS manages the find dialog, passing its settings into the nsIWebBrowserFind. 

> [4] finddialog.js

The dialog.

The "whole word" setting is exposed all the way through the APIs, but simply not
implemented at the lowest level (nsWebBrowserFind). The patch here implements
that, but failed superreview. So it needs more work.
Comment 61 User image Will Martin 2005-09-09 10:00:06 PDT
Ah.  That makes sense - they're not duplicates, but stacked APIs that inherit
functionality from one another.

I sure wish I had the C++ skills to fix the problems with the nsIFind patch,
then.  I've really got to pick that up at some point.  Thanks for the cogent
Comment 62 User image Ria Klaassen (not reading all bugmail) 2006-01-10 00:43:31 PST
*** Bug 322897 has been marked as a duplicate of this bug. ***
Comment 63 User image Will Martin 2006-10-07 03:39:22 PDT

Another year passes in silence broken only by the lone dupe report in August.  Happy seventh birthday, #14871.  Soon you'll be a fully grown bug, capable of smashing poorly defended Japanese cities before dying in terrible combat with a giant robot.

We came so close to fixing this back in '04.  Is Ben Howe still around?  Does he care any more?  Is the most recent patch still a workable base, or have the interfaces changed enough to require a rewrite?

Comment 64 User image Rachel 2009-06-16 13:08:47 PDT
This *still* hasn't been fixed?  It's 2009 already!  10 years apparently since this bug was first introduced.  IE has had the "whole word only" function for how long now.  It really is a pain to search for a word, only to have to continously click "Find next" because it finds words that CONTAIN the real word you're looking for.
Comment 65 User image David Bolter [:davidb] 2009-06-16 13:14:57 PDT
Moving to product: Firefox, for triage.
Comment 66 User image Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2009-09-07 05:44:57 PDT
*** Bug 515027 has been marked as a duplicate of this bug. ***
Comment 67 User image shum_bin 2009-09-29 22:04:08 PDT
*** Bug 518082 has been marked as a duplicate of this bug. ***
Comment 68 User image shum_bin 2009-09-29 22:13:23 PDT
Could someone please give more info on the current status of this bug.
Comment 69 User image Jo Hermans 2009-11-14 04:00:40 PST
*** Bug 528678 has been marked as a duplicate of this bug. ***
Comment 70 User image eyal gruss (eyaler) 2009-11-14 07:32:14 PST
is this not a duplicate of Bug 269442 which was wontfix?
Comment 71 User image Ben Howe 2010-01-02 11:51:21 PST
This is Ben here....I am still around hoping that somebody would roll this into the baseline, but I gave up long ago after many failed attempts at getting it rolled in.
Comment 72 User image fmcfix 2010-02-06 01:49:44 PST
This feature is rather important to implement. For example, Google searches for whole words by default. Everybody is used to it as the search results are exactly what the user wants. IE has this feature too.

The priority of this request should be changed to important. This is a long-time deficiency of Firefox which needs to be fixed. I know other people complaining about this.
Comment 73 User image kinwolf 2010-03-03 11:51:01 PST
Honestly, this should have been done long ago.  Firefox search bar feels like it's stuck in 1990 without this basic feature.  When I search for "IT" I don't want all the "italian", "itheration", etc words poping up before bringing me to the real word I am looking for.
Comment 74 User image Roman 2010-05-13 03:14:23 PDT
Please someone mark as "blocks bug 565552" (unfortunately can't do that myself) - looks like there's hope!
Comment 75 User image Roman 2010-11-13 01:38:08 PST
Mike Connor was strongly opposed to this idea in bug 269442, seemingly based mostly on the number of people who want this vs the cost of the extra "clutter".

This is so backwards!...
Comment 76 User image Mike Connor [:mconnor] 2010-11-13 09:21:53 PST

*** This bug has been marked as a duplicate of bug 269442 ***
Comment 77 User image Miguel Vargas 2010-11-13 17:10:54 PST
Wow.  After being subscribed to this bug for so many years it's almost sad to see it die such an unceremonious death.
Comment 78 User image Thomas D. (currently busy elsewhere; needinfo?me) 2014-05-10 13:03:32 PDT
If you still want this bug to be fixed, please vote for bug 269442:
("vote" link in the Importance field at the top of bug 269442).

Moved 12 duplicates to the currently active version of this bug (bug 269442), to get the record correct there:

Bug 15055, bug 104046, bug 118118, bug 143180, bug 156517, bug 192934, bug 198922, bug 252262, bug 322897, bug 515027, bug 518082, bug 528678.

Bug 269442 itself already had 6 duplicates (including this bug):
Big 14871, bug 243500, bug 348310, bug 351430, bug 410103, bug 505684

So the total number of dupes is now 18.
At the time of this comment (2014-05-10), this bug 14871 has 40 votes, and bug 269442 has 39 votes (intersections possible).

Maybe one day it'll dawn on them that this ridiculous lack of basic functionality in the primary UI should be fixed, 15 years after this was first filed. Sad.
Comment 79 User image bobagadush 2015-05-13 13:44:28 PDT
I spent about half an hour trying to figure out where the 'whole word only' option was in Firefox's search. I eventually ended up here, where I was shocked to find that not only is there no such option, but this was - and still is - intentional. WTF.

I created an account just to vote for this issue. This is ridiculous beyond belief. I've switched to other browsers just to get this before, and I will again today.

I feel insulted.

Note You need to log in before you can comment on or make changes to this bug.