Add Find Whole Word/ Find Exact String Option to Find Toolbar

VERIFIED FIXED in Firefox 50

Status

()

Toolkit
Find Toolbar
P1
enhancement
VERIFIED FIXED
13 years ago
4 months ago

People

(Reporter: M. LaPlante, Assigned: mikedeboer)

Tracking

(Depends on: 2 bugs, Blocks: 5 bugs, {ux-efficiency})

unspecified
mozilla50
ux-efficiency
Points:
13
Dependency tree / graph
Bug Flags:
firefox-backlog +

Firefox Tracking Flags

(firefox50 verified, relnote-firefox 50+)

Details

(URL)

Attachments

(3 attachments, 15 obsolete attachments)

7.56 KB, patch
Details | Diff | Splinter Review
14.48 KB, patch
dao
: review+
Details | Diff | Splinter Review
36.98 KB, patch
Ehsan
: review+
dao
: review+
Details | Diff | Splinter Review
(Reporter)

Description

13 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

It has occurred to me that the Find toolbar in FireFox does not include an
option equivalent to IE's "Match whole word only" option.  I'm surprised I could
not locate a bug requesting this; if I missed it I appologize.  It's a simple
enough feature, rather than finding substrings containing the search string, it
only displays whole strings containing the search string.  I believe in regular
practice, a "whole string" is denoted by a string of characters delimited by
whitespace or a non-alphanumeric character.  It's a very useful feature when
searching a page for text that often appears as a substring in other words.

Reproducible: Always
Steps to Reproduce:

Comment 1

13 years ago
can't find dupes, confirming. Though there's not much space in Find toolbar..
Status: UNCONFIRMED → NEW
Depends on: 14871
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All

Comment 2

13 years ago
It's a highly desirable feature. For example, it is desirable when searching
through a long list of bugs in Bugzilla on the query results page to see if a
bug has already been entered or not.

Comment 3

13 years ago
If space on the toolbar is an issue, it can be made undockable (should anyway!).
See bug 272491.

Comment 4

13 years ago
This is also a useful feature when searching through JavaDoc API info where a
substring match isn't what you want. For example, when searching for
"exception", you have to wade through too many results if you really want to
jump directly to the base class link for "Exception". 

Comment 5

12 years ago
I agree, though in a slightly different context.  As a Web Developer, I am
constantly using the "Find" to wade through the source code, and often times I
don't want every word that contains my search term - I want JUST the search term
by itself.  Have noticed this for a long time now but finally decided to see if
it was in Bugzilla as a bug/suggestion, and as it is, I thought I'd give my two
cents worth.

(In reply to comment #4)
> This is also a useful feature when searching through JavaDoc API info where a
> substring match isn't what you want. For example, when searching for
> "exception", you have to wade through too many results if you really want to
> jump directly to the base class link for "Exception". 

Comment 6

11 years ago
While this would certainly be a useful feature in a small set of cases, the benefit it gives isn't worth the additional UI clutter.  Marking WONTFIX as per Ben.
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → WONTFIX

Comment 7

11 years ago
I disagree. I think if you surveyed a bunch of experienced users you'd find that many people would appreciate such a feature -- in fact I switch to IE sometimes to use it. And it would hardly add any clutter at all -- just one more check box on the find toolbar.
(Reporter)

Comment 8

11 years ago
I disagree as well:  Every other browser I've used across multiple operating system features this option, as well as most other applications that have a search function.  There's a reason they do...  People use it!  I don't think the fact that the search bar has poor design (if you choose to use that argument) should be justification for killing this.  Personally, I have plenty of room on mine, but I'd rather the bar was redesigned than have to live without this basic feature.  This will be one of the reasons I'm moving to alternative browsers now I'm afraid.
*** Bug 348310 has been marked as a duplicate of this bug. ***
*** Bug 351430 has been marked as a duplicate of this bug. ***
*** Bug 243500 has been marked as a duplicate of this bug. ***

Updated

10 years ago
Duplicate of this bug: 410103

Comment 13

9 years ago
There is place enough on my findbar.

Comment 14

9 years ago
This bug has 9 votes, four duplicates, lots of supporters, and only one opponent who says "as per Ben" when there is no comment from any "Ben" in this thread.

Please could this bug be reopened?
The Ben in question was Ben Goodger. 9 votes and 4 duplicates doesn't really, IMO, count for "lots of supporters".

For people who mention "there's space on the toolbar", while true, the presence of space doesn't mean that adding pixels and UI clutter is without cost. Does adding a space after the word not effectively do the same thing? Or is the key that this is needed more when viewing source code in the View Source window?

To be honest, I'm really surprised that an add-on hasn't appeared for this. 

Comment 16

9 years ago
> The Ben in question was Ben Goodger.

Not sure what you're trying to say; there's still no justification for the closing of this bug, whether from any Ben or anyone else.

> 9 votes and 4 duplicates doesn't really, IMO, count for "lots of supporters".

True, maybe not lots -- but certainly enough. How many duplicates and votes did it take for Firefox to get the Find Toolbar itself? Probably none, yet it is popular. The same goes for many other features. The benefit of "whole words" search is huge; it is really no surprise that pretty much every application has this option in their "Find" dialog, including even things like Notepad.

> Does adding a space after the word not effectively do the same thing?

Most certainly not. Words can end in a space, a punctuation mark, a parenthesis, or even the end of a paragraph/table cell/whatever. The same goes for the beginning of the word.

> Or is the key that this is needed more when viewing source code in the View
> Source window?

It would certainly be useful there, but I have encountered multiple usecases where it is required on an actual website. Surely this is irrelevant though; implementing this feature would benefit both websites and view source equally.

> To be honest, I'm really surprised that an add-on hasn't appeared for this. 

Solutions to this dilemma exist, but are extremely techy and therefore out of the reach of most normal people. For example, I have a JavaScript bookmarklet that highlights regexp matches on a website. I can therefore use \b...\b to fake a "whole word search". How many people would be able to come up with this "solution"?
(Reporter)

Comment 17

9 years ago
I'm still a strong proponent of getting this added to FF.

With that said, I just had an alternative idea that I think may just be crazy enough to work.  What if we added regex support to the existing find box instead?  This would seem to meet the criteria of both sides of the argument:

1- The power users get their enhanced page search functionality built in.  In fact, a decent regex engine can be much more powerful than the original feature request.

2- The UI sticklers who are against adding anything to the find bar get their wish... since this feature is only in the backend, there are no additional toggles needed in the user interface.

Thoughts?

Comment 18

9 years ago
> What if we added regex support to the existing find box instead?

Hm. This regex support must certainly be off by default, as otherwise normal people wouldn't be able to search for things like +, *, (), etc. It could be made an option in about:config that is off by default.

However, I don't think this should be necessary. I still don't see any problem, even UI-wise, with the whole-words search option. If an extra checkbox doesn't fit on someone's toolbar, then they are no worse off than they are now, but for 99.99% of all users, the toolbar is big enough to fit this option.

I also don't think it's a slippery slope argument, because all other Find dialogs I see everywhere have an option for whole-words search. Any other option might be contentious because of the slippery slope, but not this one. It's just too essential and too ubiquitous to be missing.
Discussion of regex support is outside the scope of this bug. Please file a new bug for it and continue the discussion there. Remember, this is a public bug tracker, not a discussion forum. Spamming bugs with unrelated comments only makes it harder for Bugzilla to serve its intended purpose for those who use it.
(Reporter)

Comment 20

9 years ago
I know bloody well what this bug is about; I filed it!  In case you hadn't noticed, the original suggestion was WONTFIXED, which means we have basically zero chance of ever getting this functionality in its current form.  Since my primary concern in filing the bug was/is to obtain the functionality as documented, I see no problem with exploring alternative implementations.  

The purpose of this bug was/is to find a way to search for specific strings using the find toolbar.  The details of the final implementation, should one ever come to exist, are not yet determined, and are therefore open to suggestion.  It's in no way outside of the scope of what this bug is trying to accomplish to suggest new ways of solving the problem.

Furthermore, I find it highly hypocritical to presume to tell me what suggestions I'm allowed to make within my own bug report, while simultaneously not providing any bug-relevant material yourself.  That, sir, is the spam.
Product: Firefox → Toolkit
(Reporter)

Comment 21

9 years ago
The new IE 8 beta comes up with a novel solution for this.  Rather than adding all items directly into the search toolbar, they've chosen to go with an Options button which offers both Match Case and Match Whole Word, menu style.  This allows their toolbar to offer both features while consuming minimal horizontal toolbar real estate.

I'm going to reopen this bug for re-evaluation.  This is not out of disrespect for the developers who originally WONTFIXED it, but rather in hopes that in the nearly four years since it was originally suggested, perhaps enough has changed to make it feasible.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---

Updated

8 years ago
Duplicate of this bug: 505684

Comment 23

8 years ago
I vote for the feature.  It's plain ludicrous not to have it.  And frankly it makes me question the competency of the PHB's at Mozilla.
*puts on pointy-headed-wig* ;)

This really seems like heavy UI, which is not what we want in Firefox.  The overhead involved in toggling modes almost certainly eats up any advantage in reducing the result sets, especially since it's unlikely to be easy.

Nothing new was cited in reopening this report, just rehashing of old arguments. ("Some other browser does it now." doesn't count.)   The original rationale was sufficient while Ben was module owner, and as the current module owner I'm reinforcing that.

(As for "you can't tell me what to do with a bug I filed" that is patently false.  Bugzilla isn't so libertarian, sorry...  If a bug isn't something we're going to fix, we're going to mark it as WONTFIX, and unless you have something new and compelling, it should not be reopened.)
Status: REOPENED → RESOLVED
Last Resolved: 11 years ago8 years ago
Resolution: --- → WONTFIX

Comment 25

8 years ago
WAH. :|

If I knew C I would just do it myself but unfortunately I don't.  This feature is beyond trivial.  The only rational reason I can find for it not being done is laziness.  If that can be called rational.

Would u put it in if somebody handed in a patch?

Comment 26

8 years ago
I would like to see the arguments for and against weighed up reasonably.

From what I've seen, I remember quite a lot of arguments for it:
* added functionality, no loss of any existing functionality
* many people want it
* many applications (not only browsers) have it
* there is enough space in the Find toolbar

The only argument I've seen against it is "clutter" or "heavy UI".

Quite honestly, I'm having difficulty appreciating this argument against. What is the big problem? Perhaps someone can elaborate on this a bit so that normal users understand the reasoning?

Comment 27

8 years ago
Mike: I would like to understand why "Match case" deserves the space and the "UI clutter" but "Whole words" does not.

Please could you clarify whether you are implying that "Match case" is so much more useful than "Whole words"?
(In reply to comment #25)
> Would u put it in if somebody handed in a patch?

No. This has nothing to do with being lazy, but thanks for the veiled accusation! (If it were just that, we could leave the bug open forever, like the SeaMonkey bug.)

(In reply to comment #26)

> Quite honestly, I'm having difficulty appreciating this argument against. What
> is the big problem? Perhaps someone can elaborate on this a bit so that normal
> users understand the reasoning?

Good UI presents the controls needed, and doesn't show controls that won't be needed.  If we added options for everything that we could theoretically find space for, we wouldn't have very good UI.

(In reply to comment #27)
> Mike: I would like to understand why "Match case" deserves the space and the
> "UI clutter" but "Whole words" does not.
>
> Please could you clarify whether you are implying that "Match case" is so much
> more useful than "Whole words"?

I certainly didn't imply that, nor do I really believe it to be the case.  I actually use Match Case very infrequently, and because it persists, it probably is more trouble that it is worth.  It's been in the product since before 1.0, and unlike "match whole words only" is in basically every find UI I have ever used, so removing is a different discussion than adding something else.

Because we can isn't a good enough reason.  Because we already have something else of questionable value is not an excuse either.  (Down that path lies disaster.)  This just isn't a very useful feature in the general case, so implementing and supporting it is a net loss compared to investing that time elsewhere.

Comment 29

8 years ago
> Good UI presents the controls needed, and doesn't show controls that won't be
> needed.

Can you provide evidence to substantiate your claim that Find Whole Words "won't be needed" despite all the comments on this bug to the contrary? Personally, I need this feature pretty regularly; at the same time, I use "Match case" rather rarely and I've never used "Highlight All" at all.

> and unlike "match whole words only" is in basically every find UI I have ever
> used

Can you provide evidence to support this claim too? I have not seen a find UI without "whole words"; I'm not saying there aren't any, but it certainly indicates that they are rare. If you look hard enough, I'm sure you'll find a find UI without "match case" too.

> Because we can isn't a good enough reason.  Because we already have something
> else of questionable value is not an excuse either.

I used neither of those two arguments (I realise they are not good arguments). Yet I found four other arguments in favour (comment #26). Would you like to respond to those to explain why you think they "don't count"?

Comment 30

8 years ago
(In reply to comment #28)
> No. This has nothing to do with being lazy, but thanks for the veiled
> accusation! (If it were just that, we could leave the bug open forever, like
> the SeaMonkey bug.)

Bullheadedness then?  Callousness?  Apathy?
 
> Good UI presents the controls needed, and doesn't show controls that won't be
> needed. ... I
> actually use Match Case very infrequently, and because it persists, it probably
> is more trouble that it is worth.  It's been in the product since before 1.0,
> and unlike "match whole words only" is in basically every find UI I have ever
> used, so removing is a different discussion than adding something else.

By that logic u must remove Match Case and replace it with Whole Word.  I can't ever recall using it in a browser.  On the contrary I can recall many times when I needed Whole Word.  In fact I copied the entire text into another app so I could use it once!!  If u would just give up this defensiveness and put it in there I'm positive that u urself would find it very useful in a short amount of time.  I find ur resistance and discounting of many people's points to be childish, annoying, and just plain unprofessional.

Comment 31

8 years ago
Mike, I suggest that we ignore Chris Wagner's polemics and continue our reasonable discussion.

Comment 32

8 years ago
Created attachment 397259 [details]
Whole word only search code

Just to prove how trivial a fix this is, I wrote the following Perl function that performs a whole word only search without regular expressions.  This took 15 minutes.  Doing it in C would be similar.

Updated

8 years ago
Attachment #397259 - Attachment mime type: application/octet-stream → text/plain

Comment 33

8 years ago
Mike Connor, the fact that you are not responding to my concerns gives an impression that you do not really have any arguments and you are just acting on a whim. Is that the impression you intend to give?
Actually, no, but I've provided my arguments, you choose to discount them, and I don't think it is the best use of my time to argue endlessly when someone is absolutely convinced of the opposite of what I believe.

I believe this is the point where I must cite http://bugzilla.mozilla.org/page.cgi?id=etiquette.html and move on.  This bug is WONTFIX by module owner decision, and continuing to vent will not help your case further.  I suggest, if it is so trivial, that someone can easily create an extension for that set of people who deem this truly important.

Comment 35

7 years ago
Should actually be resolved duplicate of bug 14871.

Updated

7 years ago
No longer depends on: 14871

Updated

7 years ago
Blocks: 565552
Depends on: 14871

Comment 36

7 years ago
@Mike Connor: I would really like to see this marked as dupe of bug 14871, because the current situation makes the status of the "whole words" idea rather confusing.

In other words: is it a "no"? Then close 14871 (but I suspect it has too many supporters for that). Is it a "maybe"? Then this bug has the wrong resolution.

Updated

7 years ago
Duplicate of this bug: 14871

Comment 38

7 years ago
I strongly disagree with the -WONTFIX- idea.
Why would it be such a "non-trivial" fix, lo these 12 years ?

I see room for a "whole Word" check-box next to the "match case" check-box.
ScreenShot:  JeffRelf.F-M.FM/Whole.Word.FireFox.PNG

Comment 39

7 years ago
I don't understand why this is marked as WONTFIX but supported by your API - https://developer.mozilla.org/en/DOM/window.find 

Bug 481513 has a good testcase showing that aWholeWord is not working. Is there any good reason why this is not supported? Can't we have it as part of the MDC docs?
That's a separate bug.  Whether we would add this as a UI option is not at all related to whether the core implemention needs to support this.

I suspect it has never worked, and it's a Mozilla extension to an unspecified (DOM0) standard, so... meh.  I'll go poke MDC folks.
Bug 614976 filed to fix MDN.  Please direct followups regarding the window.find() API to bug 481513

Comment 42

7 years ago
If space or clutter is an issue, why not add a drop-down containing options like match case and whole word? See e.g. the toolbar used by IE.

Why do you refuse to add this simple option even if it is supported by browsers like IE for years? WONTFIX does not mean RESOLVED.

Updated

7 years ago
Blocks: 195485

Comment 43

7 years ago
Between this bug and Bug 14871 I see 61 votes, dozens of duplicates, and hundreds of comments.  There seems to be lots of energy here, can we please reopen this and get it implemented?  There is plenty of room on the search bar, it will not make the UI too cluttered (and if you really feel it will, then remove "match case" since this is more useful)

Comment 44

6 years ago
WONTFIX is an abrupt and unreasonable rejection of such a commonly needed feature. It's such a pain point, and I guess it'll be faster for me to learn how to write an extension for this than to have this included in FF...

Comment 45

5 years ago
The only objection to this bug seems to be that it may clutter UI, but there seems to be general consensus among the UX team that this is at least worth a strong consideration.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---

Comment 46

4 years ago
Concur! find "whole word" definitely more useful than "match case" and much needed!

Comment 47

4 years ago
I just finally signed up to BugZilla as I couldn't quite believe that this isn't a feature yet, so I just wanted to add my vote. I left it for so many years thinking that surely it was a soon-t-be-fixed oversight.

FWIW: Personally, on my desktop FF I have well over half the screen free. Over 2/3rds when the "Phrase not found" isn't showing.
(Assignee)

Updated

4 years ago
Assignee: bugzilla → nobody
(Assignee)

Comment 48

4 years ago
Created attachment 790488 [details] [diff] [review]
Patch WIP: whole word matching support for nsFind

Hi Ehsan! Yes, I'm crazy enough to take this bug.

I flagged you for feedback, because I'd really like you to look at the C++ code I wrote. I also hope you're, at least, mildly interested in this bug!

* To detect word separator 'characters' in PRUnichars, I nicked the isDOMWordSeparator function from extensions/spellcheck/src/mozInlineSpellWordUtil.cpp.
* I adjusted the loop in nsFind.cpp (how dare I!) the check if the match is surrounded by word separators. I know this is not optimal, but I'd like to implement loop-short-circuiting after we agree upon a correct approach to determine word boundaries. Regardless, this method does seem to handle a large query rather well, so it's usable for testing :).
* I know I changed the fingerprint of a few interfaces, but didn't change the uuids yet. Part of the WIP.
* I updated the findbar UI with an additional button to enable this new feature.

Have fun!
Assignee: nobody → mdeboer
Attachment #397259 - Attachment is obsolete: true
Status: REOPENED → ASSIGNED
Attachment #790488 - Flags: feedback?(ehsan)
Comment on attachment 790488 [details] [diff] [review]
Patch WIP: whole word matching support for nsFind

Review of attachment 790488 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good overall, see my comments below.  I did not look at the front-end bits.  Please note that I need to spend some time to remember how nsFind works if you ever ask me for a real review, but if there are others who know that code these days, please feel free to ask them.

Thanks!

::: embedding/components/find/public/nsIWebBrowserFind.idl
@@ +70,2 @@
>       */
> +    attribute boolean wholeWord;

Is there a good reason to rename this attribute?

::: embedding/components/find/src/nsFind.cpp
@@ +51,5 @@
> +//    or email address in this, because those need to always fit into a single
> +//    DOM word.
> +
> +static bool
> +IsDOMWordSeparator(PRUnichar ch)

Copying this function like this sucks.  Please refactor it somewhere which you can share it in both places.  You can put it on nsContentUtils if you can't think of a better place.

@@ +1186,5 @@
>        {
> +        nsCOMPtr<nsIDOMNode> startParent;
> +
> +        int32_t beforeIndex = matchAnchorOffset - incr;
> +        int32_t afterIndex = matchAnchorOffset + patLen + incr;

What guarantees beforeIndex and afterIndex to not be out of bounds?

@@ +1193,4 @@
>  #ifdef DEBUG_FIND
> +        printf("INDEX COMPARISON:: start %d, last %d/%d", matchAnchorOffset, afterIndex, fragLen);
> +        PRUnichar firstChar = (t2b ? t2b[findex - pindex] : CHAR_TO_UNICHAR(t1b[findex - pindex]));
> +        PRUnichar lastChar = (t2b ? t2b[(findex - pindex) + (fragLen - 1)] : CHAR_TO_UNICHAR(t1b[(findex - pindex) + (fragLen - 1)]));

Ditto for |findex - pindex|.

@@ +1203,5 @@
>  #endif
> +        if (!mWholeWord || (
> +          (!beforeChar || matchAnchorOffset == 0 || IsDOMWordSeparator(beforeChar)) &&
> +          (!afterChar || afterIndex == fragLen || IsDOMWordSeparator(afterChar))
> +        ))

I'm not sure if I understand this condition...

@@ +1208,4 @@
>          {
> +#ifdef DEBUG_FIND
> +          printf("Found a match!\n");
> +#endif

Can you please submit a diff -w patch for review?  This is very difficult to review since I'm not sure what code you've actually changed.
Attachment #790488 - Flags: feedback?(ehsan)
(Assignee)

Comment 50

4 years ago
Pardon me for the late reply! Thanks for the excellent feedback!

(In reply to :Ehsan Akhgari (needinfo? me!) from comment #49)
> This looks good overall, see my comments below.  I did not look at the
> front-end bits.  Please note that I need to spend some time to remember how
> nsFind works if you ever ask me for a real review, but if there are others
> who know that code these days, please feel free to ask them.

I don't know anyone else who knows that code (wild guess: perhaps someone from the TB team?), so I'm afraid you'll be my victim until then ;)

> Is there a good reason to rename this attribute?

No real good reason, except for consistency across our codebase. I couldn't find a consumer of this property throughout our codebase, but I didn't search through comm-central, for example.

> Copying this function like this sucks.  Please refactor it somewhere which
> you can share it in both places.  You can put it on nsContentUtils if you
> can't think of a better place.

I will. I just put the function here to check if you would agree with using it.

> What guarantees beforeIndex and afterIndex to not be out of bounds?

None, really. I'll check this.

> @@ +1203,5 @@
> >  #endif
> > +        if (!mWholeWord || (
> > +          (!beforeChar || matchAnchorOffset == 0 || IsDOMWordSeparator(beforeChar)) &&
> > +          (!afterChar || afterIndex == fragLen || IsDOMWordSeparator(afterChar))
> > +        ))
> 
> I'm not sure if I understand this condition...

In words: if not searching for whole words only, fall through and register the find. When searching for whole words only, check if
1) the character BEFORE the first character of the match is a word-separator OR if the start of the match is from the start of the textNode.
AND
2) the character AFTER the last character of the match is a word-separator OR if the end of the match is at the end of the textNode.

So nsFind traverses through the DOM, plucks the text nodes (fragments) from it to search through. This does indeed mean that words spanning multiple elements, eg. <h1>supercalifragilistic<i>ex</i>pialidocious</h1>, will not match.

> Can you please submit a diff -w patch for review?  This is very difficult to
> review since I'm not sure what code you've actually changed.

I will. It may take a while, because I'm not a C++ developer. I can only promise to try my best.
(In reply to comment #50)
> Pardon me for the late reply! Thanks for the excellent feedback!
> 
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #49)
> > This looks good overall, see my comments below.  I did not look at the
> > front-end bits.  Please note that I need to spend some time to remember how
> > nsFind works if you ever ask me for a real review, but if there are others
> > who know that code these days, please feel free to ask them.
> 
> I don't know anyone else who knows that code (wild guess: perhaps someone from
> the TB team?), so I'm afraid you'll be my victim until then ;)

Haha, OK.

> > Is there a good reason to rename this attribute?
> 
> No real good reason, except for consistency across our codebase. I couldn't
> find a consumer of this property throughout our codebase, but I didn't search
> through comm-central, for example.

I'd prefer if you at least did that part in a separate patch.

> > Copying this function like this sucks.  Please refactor it somewhere which
> > you can share it in both places.  You can put it on nsContentUtils if you
> > can't think of a better place.
> 
> I will. I just put the function here to check if you would agree with using it.
> 
> > What guarantees beforeIndex and afterIndex to not be out of bounds?
> 
> None, really. I'll check this.
> 
> > @@ +1203,5 @@
> > >  #endif
> > > +        if (!mWholeWord || (
> > > +          (!beforeChar || matchAnchorOffset == 0 || IsDOMWordSeparator(beforeChar)) &&
> > > +          (!afterChar || afterIndex == fragLen || IsDOMWordSeparator(afterChar))
> > > +        ))
> > 
> > I'm not sure if I understand this condition...
> 
> In words: if not searching for whole words only, fall through and register the
> find. When searching for whole words only, check if
> 1) the character BEFORE the first character of the match is a word-separator OR
> if the start of the match is from the start of the textNode.
> AND
> 2) the character AFTER the last character of the match is a word-separator OR
> if the end of the match is at the end of the textNode.
> 
> So nsFind traverses through the DOM, plucks the text nodes (fragments) from it
> to search through. This does indeed mean that words spanning multiple elements,
> eg. <h1>supercalifragilistic<i>ex</i>pialidocious</h1>, will not match.

I don't think that's acceptable at all.  Authoring tools can inject things like empty spans inside words which are invisible for all intents and purposes.

> > Can you please submit a diff -w patch for review?  This is very difficult to
> > review since I'm not sure what code you've actually changed.
> 
> I will. It may take a while, because I'm not a C++ developer. I can only
> promise to try my best.

Sure.  FWIW, producing the diff itself is very easy, just use hg/git diff -w.  :-)
(Assignee)

Comment 52

3 years ago
Unassigning here until I have time for findbar related work again, after Australis.
Assignee: mdeboer → nobody
Status: ASSIGNED → NEW

Updated

3 years ago
Whiteboard: [feature] p=0

Updated

3 years ago
Blocks: 950073

Updated

3 years ago
No longer blocks: 950073
Flags: firefox-backlog+
Whiteboard: [feature] p=0 → p=0
Duplicate of this bug: 528678
Duplicate of this bug: 518082
Duplicate of this bug: 515027
Duplicate of this bug: 322897
Duplicate of this bug: 252262
Duplicate of this bug: 198922
Duplicate of this bug: 192934
Duplicate of this bug: 156517
Duplicate of this bug: 143180
Duplicate of this bug: 118118
Duplicate of this bug: 104046
Duplicate of this bug: 15055
**********************************************************************
If you want this bug 269442 to be fixed, please vote for it:

https://bugzilla.mozilla.org/page.cgi?id=voting/user.html&bug_id=269442#vote_269442
("vote" link in the Importance field at the top of this bug 269442).
**********************************************************************

Sorry for bugspam, but unfortunately bugzilla doesn't transfer duplicates nor votes automatically and there's no way to mass-change duplicate resolutions.

(In reply to Thomas D. from comment #78)
> Moved 12 duplicates to ... bug 269442, to get the record correct there.
> ...

So the total number of dupes is now 18.
At the time of this comment (2014-05-10), this bug 269442 has 40 votes, and bug 14871 has 40 votes (intersections possible).

> Maybe one day it'll dawn on them that this ridiculous lack of basic
> functionality in the primary UI should be fixed, 15 years after this was
> first filed. Sad.
Hi Mike, now that Australis is done (and looking at 40+ votes and 18 duplicates flowing in since time immemorial), I think there's a lot of users out there who would highly appreciate if you could complete your fine patch here... please... TIA. :)

(In reply to Mike de Boer [:mikedeboer] from comment #52)
> Unassigning here until I have time for findbar related work again, after
> Australis.

FTR:

1999: Problem of this bug 269442 first filed as Bug 14871
(... 5 years time lapse...)
2004: Bug 14871 attachment 156045 [details] [diff] [review] - patch for this bug, almost there (see Bug 14871 Comment 57)
(... 9 years time lapse, dozens of votes and duplicates...)
2013: Bug 269442 attachment 790488 [details] [diff] [review] - patch for this bug, almost there (see Bug 269442 Comment 49)
2014: ... ?
Flags: needinfo?(mdeboer)
Blocks: 998343
Blocks: 998801
(Assignee)

Comment 67

3 years ago
Hi Thomas! I highly appreciate you Bugzilla-archaeology efforts, especially for finding bug 14871, which I didn't see before. I'll definitely take a look at the approach taken there.

Australis might be released, but it's not done! So I'll still be spending a fair amount of time on that and other projects.

However, I also recognize that this is a big missing feature and I'd like to have it in sooner rather than later. As soon as I find the time I will pick this up again.
Flags: needinfo?(mdeboer)
(Assignee)

Comment 68

3 years ago
Created attachment 8425325 [details] [diff] [review]
Patch v1: Whole word matching support for nsFind

This is a complete redo of my previous patch, after I discovered the work done by Ben Howe in bug 14871.

I modernized the code, addressed the comments from sbr, which makes this a fully functional, complete patch (I *think* ;-) ).

I did some merging of my previous WIP patch here, but most of the credits should go to Howe.
Assignee: nobody → mdeboer
Attachment #790488 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8425325 - Flags: review?(ehsan)
(Assignee)

Comment 69

3 years ago
Created attachment 8425326 [details] [diff] [review]
Patch v1 w/o whitespace: Whole word matching support for nsFind
(Assignee)

Comment 70

3 years ago
Created attachment 8425335 [details] [diff] [review]
Patch v1: Whole word matching support for nsFind

Holy whitespace, Batman!
Attachment #8425325 - Attachment is obsolete: true
Attachment #8425325 - Flags: review?(ehsan)
Attachment #8425335 - Flags: review?(ehsan)
(Assignee)

Comment 71

3 years ago
Created attachment 8425336 [details] [diff] [review]
Patch v1 w/o whitespace: Whole word matching support for nsFind
Attachment #8425326 - Attachment is obsolete: true
Hi Mike, are you taking this bug for Iteration 32.2?  If so, can you provide a point value.  Thanks.
Flags: needinfo?(mdeboer)
(Assignee)

Comment 73

3 years ago
Marco, that's hard to say... I'm working on this bug mainly in my spare time, because it'll span multiple iterations due to code archaeology, complexity and review cycles, on which this bug is _very_ dependent.

So I'm not willing to commit to this for 32.2, but a point value is not a problem :)
Flags: needinfo?(mdeboer)
Whiteboard: p=0 → p=13
Thanks for the update Mike.  Would it be worthwhile breaking this bug down?
(Assignee)

Comment 75

3 years ago
(In reply to Marco Mucci [:MarcoM] from comment #74)
> Thanks for the update Mike.  Would it be worthwhile breaking this bug down?

No, I *think* the hard part is done, if Ehsan is more or less happy with the approach in attachment 8425335 [details] [diff] [review].

The points are mainly based on the difficulty of the implementation.
Hey Mike, do you have an interdiff from the previous version of the patch that I reviewed?
Flags: needinfo?(mdeboer)
Fun fact, the pdf.js viewer will have an added difficulty implementing this search mode, because of https://github.com/mozilla/pdf.js/issues/2806 and a few other similar issues. Basically, what are whitespaces for the viewer is technically not always the case for the viewer.
(Assignee)

Comment 78

3 years ago
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #76)
> Hey Mike, do you have an interdiff from the previous version of the patch
> that I reviewed?

An interdiff is not really useful, I'm afraid :/ I redid the feature from scratch...

So there's no shared context between this patch and the previous one you provided feedback on, helas!
Flags: needinfo?(mdeboer)
Comment on attachment 8425335 [details] [diff] [review]
Patch v1: Whole word matching support for nsFind

Review of attachment 8425335 [details] [diff] [review]:
-----------------------------------------------------------------

So I basically stopped looking at the patch after the comment about undoing the stuff that NextNode() has done.  Can you please explain your approach and why you chose it?  It's been years since I've looked at this code so not all of the details involved might be fresh in my mind...

::: embedding/components/find/src/nsFind.cpp
@@ +245,5 @@
>  nsresult
>  nsFindContentIterator::PositionAt(nsINode* aCurNode)
>  {
>    nsINode* oldNode = mOuterIterator->GetCurrentNode();
> +  Reset();

I don't think we should reset the iterator here.  See below.

@@ +739,5 @@
>  #endif
>    return NS_OK;
>  }
>  
> +char16_t nsFind::GetCharFromNextNode(nsIDOMRange* aSearchRange,

Nit: please put the return type on its own line.

@@ +746,5 @@
> +                                     bool aContinueOk)
> +{
> +  int32_t saveIterOffset = mIterOffset;
> +  nsCOMPtr<nsIDOMNode> saveIterNode = mIterNode;
> +  nsINode* saveCurrNode = mIterator->GetCurrentNode();

Nit: nsCOMPtr<nsINode>.

@@ +748,5 @@
> +  int32_t saveIterOffset = mIterOffset;
> +  nsCOMPtr<nsIDOMNode> saveIterNode = mIterNode;
> +  nsINode* saveCurrNode = mIterator->GetCurrentNode();
> +
> +  nsCOMPtr<nsIContent> tc = nullptr;

Nit: please move the declaration further down to where you assign to this variable.

@@ +764,5 @@
> +  {
> +    // Restore necessary member variables
> +    mIterOffset = saveIterOffset;
> +    mIterNode = saveIterNode;
> +    mIterator->PositionAt(saveCurrNode);

I think you added that Reset call above to neutralize the changes to mIterator as a result of the NextNode call above, right?

I don't like this approach in general since it's hard to guarantee that we are wallpapering over everything that NextNode() might have done correctly.  Can you please convince me why this is correct?

::: toolkit/components/typeaheadfind/nsITypeAheadFind.idl
@@ +63,5 @@
>  
>    readonly attribute AString searchString;
>                                          // Most recent search string
>    attribute boolean caseSensitive;      // Searches are case sensitive
> +  attribute boolean entireWord;         // Search for whole words only

Please rev the uuid.
Comment on attachment 8425335 [details] [diff] [review]
Patch v1: Whole word matching support for nsFind

Clearing the request to make it not show up in my queue for now, please reset it when you answer comment 79.  Thanks!
Attachment #8425335 - Flags: review?(ehsan)
(Assignee)

Comment 81

3 years ago
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #79)
> I think you added that Reset call above to neutralize the changes to
> mIterator as a result of the NextNode call above, right?
> 
> I don't like this approach in general since it's hard to guarantee that we
> are wallpapering over everything that NextNode() might have done correctly. 
> Can you please convince me why this is correct?

I'll try ;)

First off, I don't think we need the `Reset()` in `PositionAt()` really. Invoking `::PositionAt()` on the inner and outer iterator will do the trick just as well.

What `GetCharFromNextNode()` really tries to do is peek ahead on the node iterator to fetch the next node and it's first char to see if it will break there. So a better name might be `PeekAtCharFromNextNode()`(?).

In Ben Howe's implementation a pointer to the Iterator was saved, but that didn't really make sense, as 'rbs' pointed out succinctly: https://bugzilla.mozilla.org/show_bug.cgi?id=14871#c57

To peek at the next node in the Iterator's chain I _do_ think it's a good idea to
1. Save the current state
2. Fast-forward 1 step and grab the next char
3. Rollback to the previous state
...and use the mechanisms at hand to accomplish this.

What do you think?
Flags: needinfo?(ehsan)
(In reply to Mike de Boer [:mikedeboer] from comment #81)
> (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment
> #79)
> > I think you added that Reset call above to neutralize the changes to
> > mIterator as a result of the NextNode call above, right?
> > 
> > I don't like this approach in general since it's hard to guarantee that we
> > are wallpapering over everything that NextNode() might have done correctly. 
> > Can you please convince me why this is correct?
> 
> I'll try ;)
> 
> First off, I don't think we need the `Reset()` in `PositionAt()` really.
> Invoking `::PositionAt()` on the inner and outer iterator will do the trick
> just as well.

OK, please do that!

> What `GetCharFromNextNode()` really tries to do is peek ahead on the node
> iterator to fetch the next node and it's first char to see if it will break
> there. So a better name might be `PeekAtCharFromNextNode()`(?).

How about PeekNextChar?

> In Ben Howe's implementation a pointer to the Iterator was saved, but that
> didn't really make sense, as 'rbs' pointed out succinctly:
> https://bugzilla.mozilla.org/show_bug.cgi?id=14871#c57
> 
> To peek at the next node in the Iterator's chain I _do_ think it's a good
> idea to
> 1. Save the current state
> 2. Fast-forward 1 step and grab the next char
> 3. Rollback to the previous state
> ...and use the mechanisms at hand to accomplish this.
> 
> What do you think?

Hmm, I still have a bit of an uneasy feeling about this approach.  I think a better overall solution would be to rework nsFind::Find in a way that makes it know how to peek into the next word if needed, and reuse the information from the peek in the next invocation.  But that code is really hairy and I'm afraid of touching it.  :/  At the lack of that, I can't think of a better idea...
Flags: needinfo?(ehsan)
Comment on attachment 8425335 [details] [diff] [review]
Patch v1: Whole word matching support for nsFind

Review of attachment 8425335 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't really review the front-end bits carefully.

Also, this needs tests.  *Lots* of them! :-)

And thanks for working on this!

::: embedding/components/find/src/nsFind.cpp
@@ +460,5 @@
>  
>  nsFind::nsFind()
>    : mFindBackward(false)
>    , mCaseSensitive(false)
> +  , mWordBreaker(0)

Nit: nullptr.

@@ +742,5 @@
>  
> +char16_t nsFind::GetCharFromNextNode(nsIDOMRange* aSearchRange,
> +                                     nsIDOMRange* aStartPoint,
> +                                     nsIDOMRange* aEndPoint,
> +                                     bool aContinueOk)

You only pass false to aContinueOK so you might as well remove it.

@@ +756,5 @@
> +  tc = do_QueryInterface(mIterNode);
> +
> +  // Get the block parent
> +  nsCOMPtr<nsIDOMNode> blockParent;
> +  GetBlockParent(mIterNode, getter_AddRefs(blockParent));

You need to handle the failure of this method.

@@ +764,5 @@
> +  {
> +    // Restore necessary member variables
> +    mIterOffset = saveIterOffset;
> +    mIterNode = saveIterNode;
> +    mIterator->PositionAt(saveCurrNode);

Just move this code to before the condition block and get rid of the copy down below.

@@ +766,5 @@
> +    mIterOffset = saveIterOffset;
> +    mIterNode = saveIterNode;
> +    mIterator->PositionAt(saveCurrNode);
> +
> +    return (char16_t) 0;

Nit: L'\0'.

@@ +778,5 @@
> +  const nsTextFragment *frag = nullptr;
> +  int32_t fragLen = 0;
> +
> +  frag = tc->GetText();
> +  fragLen = frag->GetLength();

No need to init and assign these separately, please do that in one go.

@@ +782,5 @@
> +  fragLen = frag->GetLength();
> +
> +  // I don't see this happening, but I suppose an empty node could cause this.
> +  if (fragLen <= 0)
> +    return (char16_t) 0;

Yes, an empty node can cause this, but why are you just returning a failure code here?  You should look for the next node instead.

@@ +1147,5 @@
>        return NS_OK;
>      }
>  
> +    // Save the previous character for word boundary detection
> +    prevChar = c;

This changes the meaning of the existing code, since the current code never sets prevChar to soft-hyphen.

@@ +1225,3 @@
>      {
> +      if (prevChar == NBSP_CHARCODE)
> +        prevChar = CHAR_TO_UNICHAR(' ');

Why do you have to do this?

@@ +1233,5 @@
> +    // a) we're not matching the entire word
> +    // b) a match has already been stored
> +    // c) the previous character is a different "class" than the current character
> +    if ( (c == patc && (!mWordBreaker || matchAnchorNode || wordBreakPrev)) ||
> +        (inWhitespace && IsSpace(c)) )

Nit: no spaces around parens please.

@@ +1268,5 @@
> +          int32_t nextfindex = findex + incr;
> +
> +          // If still in array boundaries, get nextChar
> +          if ((nextfindex >= 0) && (nextfindex < fragLen))
> +            nextChar = (t2b ? t2b[nextfindex] : CHAR_TO_UNICHAR(t1b[nextfindex]));

nextChar can just be a variable local to this block.

@@ +1274,5 @@
> +          else
> +            nextChar = GetCharFromNextNode(aSearchRange, aStartPoint, aEndPoint, false);
> +
> +          if (nextChar == NBSP_CHARCODE)
> +            nextChar = CHAR_TO_UNICHAR(' ');

Why do you need to do this?

@@ +1280,5 @@
> +        }
> +
> +        // If a word break isn't there when it needs to be, flag and reset search
> +        if (mWordBreaker && !wordBreakNext) {
> +          startParent = nullptr;

I'm struggling with this part of the logic.  Can you please explain this?

@@ +1320,5 @@
>  
>          if (startParent) {
>            // If startParent == nullptr, we didn't successfully make range
>            // or, we didn't make a range because the start or end node were invisible
> +          // or, a word break wasn't present when mWordBreaker was set.

Nit: drop the period please.

::: toolkit/content/widgets/findbar.xml
@@ +610,5 @@
> +
> +      <!--
> +        - Sets the findbar entire-word mode
> +        - @param aEntireWord (boolean)
> +        - Whether or not case-sensitivity should be turned on.

Please fix the comment.
Attachment #8425335 - Flags: review-
(Assignee)

Comment 84

3 years ago
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #82)
> > First off, I don't think we need the `Reset()` in `PositionAt()` really.
> > Invoking `::PositionAt()` on the inner and outer iterator will do the trick
> > just as well.
> 
> OK, please do that!

What I meant was that `nsFindContentIterator::PositionAt` already invokes `::PositionAt()` on the inner and outer iterator, so I don't need to do anything here, just remove the `Reset()` call.


(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #83)
> Also, this needs tests.  *Lots* of them! :-)

Aye! That's the plan.

> @@ +1147,5 @@
> >        return NS_OK;
> >      }
> >  
> > +    // Save the previous character for word boundary detection
> > +    prevChar = c;
> 
> This changes the meaning of the existing code, since the current code never
> sets prevChar to soft-hyphen.

No, I introduced a different variable called `prevCharInMatch` that takes over the previous role of `prevChar` in the meaning of the existing code. `prevChar` is now used in a broader scope.

> 
> @@ +1225,3 @@
> >      {
> > +      if (prevChar == NBSP_CHARCODE)
> > +        prevChar = CHAR_TO_UNICHAR(' ');
> 
> Why do you have to do this?

AFAIK, nsWordBreaker instances do not recognize HTML entities.

> @@ +1280,5 @@
> > +        }
> > +
> > +        // If a word break isn't there when it needs to be, flag and reset search
> > +        if (mWordBreaker && !wordBreakNext) {
> > +          startParent = nullptr;
> 
> I'm struggling with this part of the logic.  Can you please explain this?

Since we're already inside the block of a match that adheres to the condition 'this match starts as a word and it matches the whole search query', the only thing left is to check if the match also ends as a word.
If the match does not end as a word, it's invalid after all and we can continue the loop in search of another match.

However, it should be written as:

```cpp
// Make the range if a word break is there when it needs to be
if (!mWordBreaker || wordBreakNext) {
  ...
```

Or I could change the block above this, instead if introducing yet another indirection:

```cpp
if (!mWordBreaker->BreakInBetween(&c, 1, &nextChar, 1)) {
  matchAnchorNode = nullptr;
  continue;
}
```

... which I *think* is the cleanest of the two.
(Assignee)

Comment 85

3 years ago
Created attachment 8447189 [details] [diff] [review]
Patch v1.1: Whole word matching support for nsFind

Ehsan, I implemented most of your review comments in this patch, except two things:

1. Tests.
2. In `nsFind::PeekNextChar` at `(fragLen <= 0)` you mentioned that I should go fetch the next node. I'm not sure how to approach this... should I make PeekNextChar recursive? How would I maintain the 'hack' of restoring the member variables after calling `nsFind::NextNode`?
Attachment #8425335 - Attachment is obsolete: true
Attachment #8447189 - Flags: feedback?(ehsan)
Can you please provide an interdiff?  Thanks!
Flags: needinfo?(mdeboer)
(Assignee)

Comment 87

3 years ago
Created attachment 8447318 [details] [diff] [review]
Patch v1.1: interdiff
Flags: needinfo?(mdeboer)
(Assignee)

Comment 88

3 years ago
Created attachment 8447319 [details] [diff] [review]
Patch v1.1: interdiff
Attachment #8447318 - Attachment is obsolete: true
Mike, can you also provide some info on the changes you've made since the last time to help me review this easier?  And where are the tests mentioned in comment 85?
Flags: needinfo?(mdeboer)
(Assignee)

Comment 90

3 years ago
Comment on attachment 8447319 [details] [diff] [review]
Patch v1.1: interdiff

Review of attachment 8447319 [details] [diff] [review]:
-----------------------------------------------------------------

Apologize for providing the info a few days later, but I think you understand already ;)

::: b/embedding/components/find/src/nsFind.cpp
@@ +740,5 @@
>  }
>  
> +char16_t nsFind::PeekNextChar(nsIDOMRange* aSearchRange,
> +                              nsIDOMRange* aStartPoint,
> +                              nsIDOMRange* aEndPoint)

Here I changed the name to `PeekNextChar` instead of `GetCharFromNode` and removed the `aContinueOk` argument, since it wasn't used.

@@ +745,4 @@
>  {
>    int32_t saveIterOffset = mIterOffset;
>    nsCOMPtr<nsIDOMNode> saveIterNode = mIterNode;
> +  nsCOMPtr<nsINode> saveCurrNode = mIterator->GetCurrentNode();

As per your suggestion, I changed this to be a COMPtr.

@@ -754,3 @@
>  
>    // Get the text content:
> -  tc = do_QueryInterface(mIterNode);

I now declare `tc` at the point where it's used.

@@ +754,4 @@
>  
>    // Get the block parent
>    nsCOMPtr<nsIDOMNode> blockParent;
> +  nsresult rv = GetBlockParent(mIterNode, getter_AddRefs(blockParent));

This can fail, so I capture that now and deal with that below.

Before any of the `return L'\0'` escape hatches are used, I now reset the iterator state to what it was before `::NextNode()` was called to ensure there are no side-effects when using this method.

@@ +775,2 @@
>    if (fragLen <= 0)
> +    return L'\0';

You said that I should peek at the next node next if we encounter an empty node. This is what I mentioned to be struggling with... a simple call to `::NextNode()`, resetting the state again afterwards, could yield an another empty node, so I'd effectively need to call this method, `PeekNextChar`, recursively. But I already reset the state of the iterator to its original position _before_ I get here, so do I need start capturing the state in member variables with a method called `PeekNextNode` or `StartNodePeek`, for example, and call a method called `StopNodePeek` when `PeekNextChar` returns a value?

@@ +1222,5 @@
>      // a) we're not matching the entire word
>      // b) a match has already been stored
>      // c) the previous character is a different "class" than the current character
> +    if ((c == patc && (!mWordBreaker || matchAnchorNode || wordBreakPrev)) ||
> +        (inWhitespace && IsSpace(c)))

This is just a formatting change.

@@ +1268,5 @@
> +
> +          // If a word break isn't there when it needs to be, reset search
> +          if (!mWordBreaker->BreakInBetween(&c, 1, &nextChar, 1)) {
> +            matchAnchorNode = nullptr;
> +            continue;

I realized that at this point we can just bail out early here, instead of wrapping the range handling block below in another block. Also nice for retaining blame info ;)

@@ +1301,5 @@
> +            *aRangeRet = range.get();
> +            NS_ADDREF(*aRangeRet);
> +          }
> +          else {
> +            startParent = nullptr; // This match is no good -- invisible or bad range

IOW, I didn't need to change a thing in this block! Pfew!

@@ +1307,5 @@
>          }
>  
>          if (startParent) {
>            // If startParent == nullptr, we didn't successfully make range
>            // or, we didn't make a range because the start or end node were invisible

If no word break was present at the end of a match, we didn't even get here, so no need for this comment.
(Assignee)

Updated

3 years ago
Flags: needinfo?(mdeboer)
(In reply to Mike de Boer [:mikedeboer] from comment #90)
> @@ +775,2 @@
> >    if (fragLen <= 0)
> > +    return L'\0';
> 
> You said that I should peek at the next node next if we encounter an empty
> node. This is what I mentioned to be struggling with... a simple call to
> `::NextNode()`, resetting the state again afterwards, could yield an another
> empty node, so I'd effectively need to call this method, `PeekNextChar`,
> recursively. But I already reset the state of the iterator to its original
> position _before_ I get here, so do I need start capturing the state in
> member variables with a method called `PeekNextNode` or `StartNodePeek`, for
> example, and call a method called `StopNodePeek` when `PeekNextChar` returns
> a value?

We basically want to make it impossible to forget to reset the state, but do that after everything else is done.  We can use RAII for that, in that you'd have a class let's say called PeekNextCharRestoreState which takes an nsFind reference and resets mIter* members on it in its destructor, then inside PeekNextChar after GetBlockParent, you instantiate the class on the stack, initializing it with *this, and the destructor will run immediately before returning and will do the cleanup no matter where you return from.  With this in place you can just keep looking for the next node if fragLen is 0.  Does that make sense?
Comment on attachment 8447189 [details] [diff] [review]
Patch v1.1: Whole word matching support for nsFind

Review of attachment 8447189 [details] [diff] [review]:
-----------------------------------------------------------------

I think this makes sense.  But we do need tests here!  ;-)
Attachment #8447189 - Flags: feedback?(ehsan) → feedback+
(Assignee)

Comment 93

3 years ago
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #91)
> Does that make sense?

Absolutely! Thanks :)

Oh, and tests are in the making. Next patch will have those attached.
(Assignee)

Comment 94

2 years ago
Created attachment 8542972 [details] [diff] [review]
Patch v1.2: Whole word matching support for nsFind

This patch cleans up things quite a bit, thanks to Ehsan's suggestions (using RAII to safely do look-ahead matching works great).

Since the directory structure of embedding/ changed significantly, an interdiff patch is quite difficult to accomplish, I'm afraid.

However, there's plenty of time to read through this, because I've got plenty of tests to write.
Attachment #8425336 - Attachment is obsolete: true
Attachment #8447189 - Attachment is obsolete: true
Attachment #8447319 - Attachment is obsolete: true

Comment 95

2 years ago
Thanks for making a patch, but I'm not about to figure out how to recompile Firefox to get a feature that should be in there already, and I'm a professional software developer. The fact that this feature is not in the normal release after all these years is incomprehensible.
Mike, thank you for making great inroads into addressing this significant usability gap of "Find in page" functionality as used in Firefox and other toolkit consumers like Thunderbird. I was under the impression that your last patch (5 months ago, attachment 8542972 [details] [diff] [review]) was at an advanced state, basically "only" lacking tests ;p I and many others who voted for this or filed duplicates would hate to see your fine work here bitrot, one year after your commitment that "this is a big missing feature and I'd like to have it in sooner rather than later"... After 16 years of waiting for this feature, some of us are getting tired the extent that they are dumping Firefox altogether (see comment below), because other browsers like IE *do* have this feature...

Way forward:

Concerning your last patch (attachment 8542972 [details] [diff] [review]):
* Do you need feedback or review from someone? (you said "there's plenty of time to read through this", but no one was asked to read it)
* Is there an agreement with :Ehsan that the interdiff as desired by him is no longer required? (as you said "the directory structure of embedding/ changed significantly, an interdiff patch is quite difficult to accomplish")
* Have you had a chance to start writing tests? (you said on 2014-12-31, "I've got plenty of tests to write") Can we get away with moving out tests into another bug? (Probably the current implementation does NOT have sufficient tests, either?)
* Can you provide or point us to some explanation of the current point value (p=13) for this bug?

I truly appreciate your efforts here, and it's delicate coding in the internals which not many of us are up to. That's why we depend on you and are still (...) waiting eagerly...! :)

(In reply to Mike de Boer [:mikedeboer] from comment #67 on 2014-05-12)
> Hi Thomas! I highly appreciate you Bugzilla-archaeology efforts, especially
> for finding bug 14871, which I didn't see before. I'll definitely take a
> look at the approach taken there.
> ...
> However, I also recognize that this is a big missing feature and I'd like to
> have it in sooner rather than later. As soon as I find the time I will pick
> this up again.

(In reply to Mike de Boer [:mikedeboer] from comment #94 on 2014-12-31)
> Created attachment 8542972 [details] [diff] [review]
> Patch v1.2: Whole word matching support for nsFind
> 
> This patch cleans up things quite a bit, thanks to Ehsan's suggestions
> (using RAII to safely do look-ahead matching works great).
> 
> Since the directory structure of embedding/ changed significantly, an
> interdiff patch is quite difficult to accomplish, I'm afraid.
> 
> However, there's plenty of time to read through this, because I've got
> plenty of tests to write.

(In reply to bobagadush from bug 14871 comment #79 on 2015-05-13)
> I spent about half an hour trying to figure out where the 'whole word only'
> option was in Firefox's search. I eventually ended up here, where I was
> shocked to find that not only is there no such option, but this was - and
> still is - intentional. WTF.
> 
> I created an account just to vote for this issue. This is ridiculous beyond
> belief. I've switched to other browsers just to get this before, and I will
> again today.
> 
> I feel insulted.
Needinfo from Mike :)

(In reply to Thomas D. from comment #96)
> Mike, thank you for making great inroads into addressing this significant
> usability gap of "Find in page" functionality as used in Firefox and other
> toolkit consumers like Thunderbird. I was under the impression that your
> last patch (5 months ago, attachment 8542972 [details] [diff] [review])
> was at an advanced state, basically "only" lacking tests ;p I and many
> others who voted for this or filed duplicates would hate to see your fine
> work here bitrot, one year after your commitment that "this is a big missing
> feature and I'd like to have it in sooner rather than later"... After 16
> years of waiting for this feature, some of us are getting tired the extent
> that they are dumping Firefox altogether (see comment below), because other
> browsers like IE *do* have this feature...
> 
> Way forward:
> [snip... -> See comment 96!]
Flags: needinfo?(mdeboer)
Keywords: ux-efficiency
http://superuser.com/questions/131696/what-do-you-do-when-you-need-whole-words-search-in-firefox
http://superuser.com/questions/792764/firefox-single-word-search

Fwiw (and as stated before) there's no fully successful workaround for this bug:
* using "add a space" methods has a flaw -- like searching for " time ", it won't find "time. " or "time, " or "time" at the beginning of a line, etc... Words can end in a space, a punctuation mark, a parenthesis, or even the end of a paragraph/table cell/whatever. The same goes for the beginning of the word.
* using "Match case" will only work as a workaround in exceptional cases like when searching for "GE" for Georgia and avoid finding "general", "generation", "generic", "geek" etc.
(Assignee)

Comment 99

2 years ago
Hi Thomas, you're absolutely right that it's been too long since I've made any progress here. The patch here has probably bitrotted, so I'll post an un-bitrotted version soon.

Tests are relatively easy to do, the work involved is:
 - create a new mochitest-chrome test file for whole-word search. A mochitest-browser test might be easier to load various test pages with and has a better test framework
 - add tests that cover most, if not all, variations possible we can think of

If it counts for anything, my excuse for not moving forward here is because I was pulled in a different direction to high-prio projects like Firefox Hello. This simply consumes most, if not all, of my time. This doesn't mean I'll never finish this, in fact: super kind pings like yours add the necessary motivation to spend my spare time here! Oh so human ;-)

So thank you for your positive way of keeping us on our toes and kind ping!
Flags: needinfo?(mdeboer)
(Assignee)

Updated

2 years ago
Points: --- → 13
Whiteboard: p=13
(Assignee)

Updated

2 years ago
Flags: qe-verify+
(Assignee)

Comment 100

2 years ago
Created attachment 8620963 [details] [diff] [review]
Patch 1.3: Whole word matching support for nsFind

This is an unbitrotted and improved version of the patch I attached previously.
Attachment #8542972 - Attachment is obsolete: true
Attachment #8620963 - Flags: review?(ehsan)
(Assignee)

Updated

2 years ago
Attachment #8620963 - Attachment description: Patch v1.3: Whole word matching support for nsFind → Patch 1.3: Whole word matching support for nsFind
(Assignee)

Comment 101

2 years ago
Created attachment 8620968 [details] [diff] [review]
Patch 2: test coverage WIP

So this is the approach I'm taking wrt unit tests: browser-chrome with various HTML documents with various types of text that we'll try to make the word-breaker stumble upon.

Ehsan, do you have any thoughts on what (exotic) variations I can test and how? Ideally I'd like to work off a bullet list of scenarios that I can tick off.
Flags: needinfo?(ehsan)
The tricky stuff for tests that I can think of are cases where a word spans inline elements (such as he<span>ll</span>o), HTML inline elements that are marked as block elements via CSS and vice versa (such as he<span style="display:block">ll</span>o and he<div style="display:inline">ll</span>o), different kinds of boundaries for words (such as words appearing in the same textnode, words that are split off from other words on block boundaries or across things such as tables and lists, etc.), words that span adjacent textnodes (for example two adjacent text nodes containing "he" and "llo"), etc.  Hopefully this will give you ideas around the kinds of things I would be worried about.

Unfortunately, my review queue is way beyond manageable these days, and given that I have forgotten everything that I relearned about this code again, I'll need to read the underlying code once more before I can review this, which may take me quite a while (possibly after Whistler.)  Sorry.  :(
Flags: needinfo?(ehsan)
Sorry again for the delay here.  I will try my best to get to this some time next week.
(Assignee)

Comment 104

2 years ago
Would it help if I chopped the patch up in parts?
Not really, the biggest issue is that I need to read this code again and page in everything that I learned about it the previous time I reviewed the patch!
Comment on attachment 8620963 [details] [diff] [review]
Patch 1.3: Whole word matching support for nsFind

Review of attachment 8620963 [details] [diff] [review]:
-----------------------------------------------------------------

This is really close, but I would like to see another patch that addresses my comments below.  Please submit an interdiff.

The front-end changes look good to me overall, but I would feel more comfortable if you got another Firefox peer to double check them.

Thanks again, and so sorry for the slow turn around.  If you manage to address the review comments soon, the next iteration should be quick for me to look over since the details are still fresh in my mind.  :-)

Looking forward to see this feature landed!

::: embedding/components/find/nsFind.cpp
@@ +463,3 @@
>    , mCaseSensitive(false)
> +  , mEntireWord(false)
> +  , mWordBreaker(nullptr)

You don't need to initialize mWordBreaker.  The default constructor does the right thing.

@@ +575,3 @@
>  {
> +  mEntireWord = aEntireWord;
> +  mWordBreaker = aEntireWord ? nsContentUtils::WordBreaker() : nullptr;

Hmm, is mEntireWord really needed?  Can't we return false from GetEntireWord when mWordBreakeer is null?

@@ +735,5 @@
>  #endif
>    return NS_OK;
>  }
>  
> +class PeekNextCharRestoreState

Nit: Please mark this as final and MOZ_STACK_CLASS.

@@ +738,5 @@
>  
> +class PeekNextCharRestoreState
> +{
> +public:
> +  PeekNextCharRestoreState(nsFind* aFind)

Please mark this as explicit.

@@ +760,5 @@
> +  nsCOMPtr<nsINode> mCurrNode;
> +  nsRefPtr<nsFind> mFind;
> +};
> +
> +char16_t nsFind::PeekNextChar(nsIDOMRange* aSearchRange,

Nit: return type on its own line, please.

@@ +768,5 @@
> +  // We need to restore the necessary member variables before this function
> +  // returns
> +  PeekNextCharRestoreState restoreState(this);
> +
> +  // Initialize variables before the loop

The code seems to not do that the comment suggests!

@@ +795,5 @@
> +
> +    frag = tc->GetText();
> +    fragLen = frag->GetLength();
> +  }
> +  while (fragLen <= 0);

Nit: } while (...);

@@ +802,5 @@
> +  const char *t1b = nullptr;
> +
> +  if (frag->Is2b()) {
> +    t2b = frag->Get2b();
> +    t1b = nullptr;

This assignment can be eliminated.

@@ +805,5 @@
> +    t2b = frag->Get2b();
> +    t1b = nullptr;
> +  } else {
> +    t1b = frag->Get1b();
> +    t2b = nullptr;

So can this one.

@@ +1267,5 @@
> +#ifdef DEBUG_FIND
> +          printf("Checking for word-break at end of word... %s\n",
> +            mWordBreaker->BreakInBetween(&c, 1, &nextChar, 1) ? "yes" : "no");
> +          DumpNode(matchAnchorNode);
> +#endif

Please remove this #ifdef block.

::: embedding/components/find/nsFind.h
@@ +39,5 @@
>  
> +  int32_t mIterOffset;
> +  nsCOMPtr<nsIDOMNode> mIterNode;
> +
> +  nsRefPtr<nsFindContentIterator> mIterator;

Don't make these members public please, they need to be private.  If you need access to them from the outside, please add accessor methods.
Attachment #8620963 - Flags: review?(ehsan) → review-
(Assignee)

Comment 107

2 years ago
Thanks so much for the review, Ehsan!! I will continue working in this when I get back from vacation :-)
(In reply to Mike de Boer [:mikedeboer] from comment #107)
> Thanks so much for the review, Ehsan!! I will continue working in this when
> I get back from vacation :-)

Hi Mike, hope your vacation was great and you returned with loads of energy ;)
It's ping time again for this bug... :)
Flags: needinfo?(mdeboer)
(Assignee)

Comment 109

2 years ago
Created attachment 8688964 [details] [diff] [review]
Patch 1.4: Whole word matching support for nsFind

Interdiff coming up and unit tests for nsFind are in the works!
Attachment #8620963 - Attachment is obsolete: true
Flags: needinfo?(mdeboer)
(Assignee)

Comment 110

2 years ago
Created attachment 8688965 [details] [diff] [review]
Interdiff between 1.3 and 1.4
(Assignee)

Updated

a year ago
Priority: -- → P1
(Assignee)

Comment 111

a year ago
Created attachment 8748152 [details] [diff] [review]
Patch 1.5: Whole word matching support for nsFind

The previous interdiff still counts!
I see you're not 'accepting' reviews atm, but I don't think it's really fair to punt this to someone else. I wouldn't know who, TBH! :jfkthame?
Anyway, please feel free to proxy this request!
Attachment #8688964 - Attachment is obsolete: true
Attachment #8748152 - Flags: review?(ehsan)
(Assignee)

Comment 112

a year ago
Created attachment 8748155 [details] [diff] [review]
Patch 2: add tests for findbar entire word searching feature

Finally got around to writing the tests for this puppy! As you can see, it's quite extensible to contain coverage for multiple languages in the future.
Attachment #8620968 - Attachment is obsolete: true
Attachment #8748155 - Flags: review?(ehsan)
Blocks: 1271782
(Assignee)

Comment 113

a year ago
Comment on attachment 8748155 [details] [diff] [review]
Patch 2: add tests for findbar entire word searching feature

Review of attachment 8748155 [details] [diff] [review]:
-----------------------------------------------------------------

Jared, may I punt this to you? Please feel free to assign someone else who you'd think to be a better fit (I can't think of someone ;-) ).
Attachment #8748155 - Flags: review?(ehsan) → review?(jaws)
(Assignee)

Updated

a year ago
Attachment #8748155 - Flags: review?(jaws) → review?(dao+bmo)
(Assignee)

Comment 114

a year ago
Comment on attachment 8748152 [details] [diff] [review]
Patch 1.5: Whole word matching support for nsFind

Review of attachment 8748152 [details] [diff] [review]:
-----------------------------------------------------------------

Dão, would you mind reviewing the /toolkit/ bits of this patch along with the unit tests?
Attachment #8748152 - Flags: review?(dao+bmo)
Comment on attachment 8748152 [details] [diff] [review]
Patch 1.5: Whole word matching support for nsFind

>         prefsvc.addObserver("accessibility.typeaheadfind",
>                             this._observer, false);
>         prefsvc.addObserver("accessibility.typeaheadfind.linksonly",
>                             this._observer, false);
>         prefsvc.addObserver("accessibility.typeaheadfind.casesensitive",
>                             this._observer, false);
>+        prefsvc.addObserver("accessibility.typeaheadfind.entireword",
>+                            this._observer, false);

While "accessibility.typeaheadfind" by itself is an accessibility feature, the "accessibility.typeaheadfind." prefix seems weird and just wrong. Please don't use that for new prefs.

>+          let prefsvc =
>+          Components.classes["@mozilla.org/preferences-service;1"]
>+                    .getService(Components.interfaces.nsIPrefBranch);

nit, indent like this:

>          let prefsvc =
>            Components.classes["@mozilla.org/preferences-service;1"]
>                      .getService(Components.interfaces.nsIPrefBranch);


This will need ui-review too.
Attachment #8748152 - Flags: review?(dao+bmo)
(Assignee)

Comment 116

a year ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=33fa5ef86986
(Assignee)

Comment 117

a year ago
Comment on attachment 8748152 [details] [diff] [review]
Patch 1.5: Whole word matching support for nsFind

Stephen, would you like to review this new findbar feature?
You can find fresh build at https://archive.mozilla.org/pub/firefox/try-builds/mdeboer@mozilla.com-d4101bdf2ae3cc67b4b8f721a914093eb761b7b8/

To see things in action, make sure you:
1. Start the build you just downloaded
2. Open the findbar on any web page you fancy
3. Find the newly added button with the caption "Whole Words" and make sure it's toggled on
4. Start searching for a word on the page. Notice that it doesn't match partial words anymore, but whole words only.

Thanks!
Attachment #8748152 - Flags: ui-review?(shorlander)
Comment on attachment 8748152 [details] [diff] [review]
Patch 1.5: Whole word matching support for nsFind

Review of attachment 8748152 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me. I did find a bug with Highlighting though: Highlight All doesn't update if you check Whole Words

STR:
1) Go to a page
2) Open Find Bar
3) Make sure no options are toggled
4) Search for a term that has whole and partial matches
5) Toggle Highlight All
6) Toggle Whole Words

"Highlight All" highlights all matching terms, whole and partial. If you toggle "Whole Words" it keeps partial words highlighted even though it won't match them.

The reverse is also true: If you select "Whole Words" first then "Highlight All" and then uncheck "Whole Words" it doesn't then highlight partial matches.

Expected: Should correctly highlight all matches.
Attachment #8748152 - Flags: ui-review?(shorlander) → ui-review-
(Assignee)

Comment 119

a year ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8fe7b0f8abf1
(Assignee)

Comment 120

a year ago
(In reply to Stephen Horlander [:shorlander] from comment #118)
> This looks good to me. I did find a bug with Highlighting though: Highlight
> All doesn't update if you check Whole Words

Good find, thanks! Please find new builds at https://archive.mozilla.org/pub/firefox/try-builds/mdeboer@mozilla.com-648f49d509c104b3fbb42ceb49708a77c4181fe2/ - they need a while to bake still.
(Assignee)

Comment 121

a year ago
Created attachment 8756896 [details] [diff] [review]
Patch 1.6: Whole word matching support for nsFind
Attachment #8748152 - Attachment is obsolete: true
Attachment #8748152 - Flags: review?(ehsan)
Attachment #8756896 - Flags: ui-review?(shorlander)
Attachment #8756896 - Flags: review?(ehsan)
Attachment #8756896 - Flags: review?(dao+bmo)
Comment on attachment 8756896 [details] [diff] [review]
Patch 1.6: Whole word matching support for nsFind

Review of attachment 8756896 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. Thanks!
Attachment #8756896 - Flags: ui-review?(shorlander) → ui-review+
Comment on attachment 8756896 [details] [diff] [review]
Patch 1.6: Whole word matching support for nsFind

>+              this._self._typeAheadEntireWord = prefsvc.getBoolPref(aPrefName);

nit: drop "typeAhead" from that property name

>+          this._dispatchFindEvent("entirewordchange");

Where is this used?
(Assignee)

Comment 124

a year ago
Created attachment 8758242 [details] [diff] [review]
Patch 1.7: Whole word matching support for nsFind

Interdiff for Ehsan still relevant, only XUL & JS changes.
Attachment #8756896 - Attachment is obsolete: true
Attachment #8756896 - Flags: review?(ehsan)
Attachment #8756896 - Flags: review?(dao+bmo)
Attachment #8758242 - Flags: review?(ehsan)
Attachment #8758242 - Flags: review?(dao+bmo)
Comment on attachment 8758242 [details] [diff] [review]
Patch 1.7: Whole word matching support for nsFind

>           if (!this._findFailedString ||
>-              !val.startsWith(this._findFailedString))
>+              !val.startsWith(this._findFailedString) || this._entireWord)

nit: break this into three lines. While you're touching this, '{' doesn't belong on its own line either.

>       <method name="_dispatchFindEvent">
>         <parameter name="aType"/>
>         <parameter name="aFindPrevious"/>
>         <body><![CDATA[
>           let event = document.createEvent("CustomEvent");
>           event.initCustomEvent("find" + aType, true, true, {
>             query: this._findField.value,
>             caseSensitive: !!this._typeAheadCaseSensitive,
>+            entireWord: this._entireWord,
>             highlightAll: this.getElement("highlight").checked,
>             findPrevious: aFindPrevious
>           });
>           return this.dispatchEvent(event);
>         ]]></body>
>       </method>

I assume this is currently unused too...

> >+          this._dispatchFindEvent("entirewordchange");
> 
> Where is this used?

Apparently casesensitivitychange is used for pdf.js. We'll probably need a followup bug for pdf.js to support this feature as well.
Attachment #8758242 - Flags: review?(dao+bmo) → review+
Comment on attachment 8758242 [details] [diff] [review]
Patch 1.7: Whole word matching support for nsFind

Review of attachment 8758242 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!
Attachment #8758242 - Flags: review?(ehsan) → review+

Comment 127

11 months ago
Comment on attachment 8748155 [details] [diff] [review]
Patch 2: add tests for findbar entire word searching feature

>+      // Tests delays the loading of a document for one second.
>+      yield new Promise(resolve => setTimeout(resolve, 1000));

Why?
Attachment #8748155 - Flags: review?(dao+bmo) → review+
(Assignee)

Updated

11 months ago
Blocks: 1282759
(Assignee)

Comment 128

11 months ago
(In reply to Dão Gottwald [:dao] from comment #127)
> Why?

Something I copy-pasted and wrote myself in other tests... no clue, really. To appease the testing gods?

I'll remove and see how it goes... living on the edge ;)
(Assignee)

Comment 129

11 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a13d88fa497c
(Assignee)

Comment 130

11 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2b3c783bb7ea
(Assignee)

Comment 131

11 months ago
https://hg.mozilla.org/integration/fx-team/rev/1066e02a466eed00646c7049b1dfe8cf7a60c3fb
Bug 269442 - whole word matching support for nsFind. r=ehsan,dao. ui-r=shorlander

https://hg.mozilla.org/integration/fx-team/rev/11ae0d2c828791829afdce0ae297c96f506b9396
Bug 269442 - add tests for findbar entire word searching feature. r=dao

Comment 132

11 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/1066e02a466e
https://hg.mozilla.org/mozilla-central/rev/11ae0d2c8287
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago11 months ago
status-firefox50: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
(Assignee)

Comment 133

10 months ago
Release Note Request (optional, but appreciated)
[Why is this notable]: well, there might be quite a number of people who were waiting for 12 years to see this get implemented. Safari and Blink have had this a long time already, so it might also be something that made people switch before.
[Suggested wording]: Find in page now supports a mode to search for whole words only.
[Links (documentation, blog post, etc)]: I'm afraid my blogging is lacking for a while now. On it, but don't know when I'll have a post up.
relnote-firefox: --- → ?

Comment 134

10 months ago
Added to Fx50 (Aurora) release notes.
relnote-firefox: ? → 50+

Comment 135

9 months ago
Hi I tried this is nightly beta, and it seems to have some undesired behavior in Chinese.

The word break detector used seems to look for changes in character class.  In  languages in which the words are not separated by spaces, there is no character class change at the end of the word.
For example if you search for the single character word 在 on the page https://zh.wikipedia.org/wiki/Wikipedia:%E9%A6%96%E9%A1%B5  and then turn on the whole word matching, only one instance is found - " 在1960" - the one in which the word is surrounded by non-CJK characters.

In contrast, if you try that same page in IE, turning whole word matching on and off for Chinese seems to have no change in the number of words found.

It may be possible to get better behavior using the ICU word break iterator.
(Assignee)

Comment 136

9 months ago
(In reply to JonathanW from comment #135)
> Hi I tried this is nightly beta, and it seems to have some undesired
> behavior in Chinese.
> 
> The word break detector used seems to look for changes in character class. 
> In  languages in which the words are not separated by spaces, there is no
> character class change at the end of the word.
> For example if you search for the single character word 在 on the page
> https://zh.wikipedia.org/wiki/Wikipedia:%E9%A6%96%E9%A1%B5  and then turn on
> the whole word matching, only one instance is found - " 在1960" - the one in
> which the word is surrounded by non-CJK characters.
> 
> In contrast, if you try that same page in IE, turning whole word matching on
> and off for Chinese seems to have no change in the number of words found.
> 
> It may be possible to get better behavior using the ICU word break iterator.

Thanks for checking this! As I have no knowledge of the desired behavior when using Chinese, I'm quite dependent on contributions by people who do. I already wrote the unit tests to be able to accommodate other languages and encodings with relative ease.
I don't think nsIWordBreaker is using ICU, so that'd be something to look at. Or skip fixing up nsIWordBreaker and include ICU directly.
Could you file a bug, that blocks this one, which will continue this?

Comment 137

8 months ago
File Bug 1302492 - The new Find Whole Word/ Find Exact String Option does not find Chinese words

Updated

8 months ago
Depends on: 1302492
I've managed to perform some test around this feature, and I can confirm that this is working as expected. Since JonathanW filed a new bug for the string problem with Chinese words (comment 137) I will mark this issue as verified fixed.

The test were done under Windows 10 x64, Ubuntu 16.04 x64 and macOS 10.12 on 50.0b11 (20161027110534).
Status: RESOLVED → VERIFIED
status-firefox50: fixed → verified
Flags: qe-verify+

Updated

6 months ago
Depends on: 1318535

Comment 139

6 months ago
The behavior is not consistent with the Match Case option: enabling Match Case with some text entered moves you to the next result, whereas enabling Whole Words does not. More than that, it shows wrong matches number - "of 100 matches" when there is none. Just search for "fin" on this bug page and enable Whole Words. I would open a new bug for this, but I already opened a more critical bug 1318535 and it went unnoticed, so I decided it'll be better to just comment here.
You need to log in before you can comment on or make changes to this bug.