Last Comment Bug 32641 - Wildcard (or regexp/regular expression) searching
: Wildcard (or regexp/regular expression) searching
Status: NEW
: helpwanted
Product: SeaMonkey
Classification: Client Software
Component: Find In Page (show other bugs)
: Trunk
: All All
: -- enhancement with 14 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 37941 197470 244835 (view as bug list)
Depends on: 106590
Blocks: 298127 106961
  Show dependency treegraph
 
Reported: 2000-03-21 04:23 PST by Brian 'netdragon' Bober
Modified: 2013-06-18 08:48 PDT (History)
21 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description Brian 'netdragon' Bober 2000-03-21 04:23:33 PST
When you search, it should make a list of all the matching words in a little 
window that you can individually click on. Norton Systemworks 2000 registry 
editor does this if you need to see an example.

When you click enter in the search box, it doesn't do anything. Also, I don't 
know if you have done this already, but you should create a concordance of the 
page at some time using a tree that contains a root pointing to a number of 
children, each representing a different letter, and each of those pointing to a 
different letter. Then when you add a word, you create a child for each letter 
and then add a linked list containing the words and where in the page they are. 
I did that on one of my programs, and it could create a concordance of a 20 Meg 
file in less than a second (on my computer) and find all occurances in less 
than a second. (It used much memory though). If the person doesn't have enough 
memory for a certain page, a concordance could be dropped.
Comment 1 Brian 'netdragon' Bober 2000-03-21 04:27:35 PST
Also, a concordance is great for wild card characters such as *lo* would return 
Halo, Hello, Melon, etc... Hell* would return Hello and Hell. For the first, 
all you have to do is return all children that have a l followed by an o. I 
think you should inlude wildcards in your browser search.
Comment 2 Gervase Markham [:gerv] 2000-03-21 08:26:04 PST
Rewriting bug to reflect last sentence of previous comment, which I think is 
the point boberb is trying to make :-). 

Gerv

Comment 3 Claudius Gayle 2000-03-21 13:36:42 PST
Find on Page = XPApps. reassigning.
Comment 4 Simon Fraser 2000-03-21 13:55:02 PST
Oh, gee, you'll want regular expressions next!
Comment 5 Brian 'netdragon' Bober 2000-03-21 13:58:29 PST
For clarification, I meant the Search document. This is very easy to implement 
(for text files), believe me. I did it for a school project. I did it without 
using STL. Using STL would make it even simpler. :)
Comment 6 Kevin Puetz 2000-03-21 16:42:17 PST
I do want regular expressions :-)
Comment 7 Bill Law 2000-03-22 16:45:33 PST
Setting target to M17.  This is really a thing for the "document text services" 
which is owned by somebody else but I'll take it for now.
Comment 8 don 2000-05-25 17:23:00 PDT
Move to M20 target milestone.
Comment 9 Paul Chen 2000-12-28 13:20:21 PST
nav triage team:

Not something we'll get to for beta1, marking nsbeta1-. Bill please reassign to 
the appropriate party.
Comment 10 Paul Chen 2001-03-20 10:44:43 PST
Marking nsbeta1- bugs as future to get off the radar
Comment 11 sairuh (rarely reading bugmail) 2002-01-24 15:12:41 PST
akkana, should this be yours...?
Comment 12 Akkana Peck 2002-01-24 15:20:02 PST
A request for wildcard searching, as specified in the summary, would be mine
(and sounds like a fun project).  But the original request here seems to be for
a redesign of the find UI, which should belong to some front-end person, not for
adding wildcard searching in the backend.  Could the submitter (or anyone else
interested in this bug) please clarify?
Comment 13 sairuh (rarely reading bugmail) 2002-01-24 15:25:51 PST
mass moving open bugs pertaining to find in page/frame to pmac@netscape.com as
qa contact.

to find all bugspam pertaining to this, set your search string to
"AppleSpongeCakeWithCaramelFrosting".
Comment 14 Brian 'netdragon' Bober 2002-01-27 02:17:04 PST
Akkana: This bug got a little overrun with requests :-) Please implement the
wildcard searching. I'm moving the list request to another bug. I would also
like if you could do ? in the search to mean only one character. \* and \? would
be for the * and ? characters. Please don't do regexp as that would baffle the
average user. Thanks.
Comment 15 Brian 'netdragon' Bober 2002-01-27 13:16:38 PST
New bug is bug 122061 for list of search returns.
Comment 16 Akkana Peck 2002-03-25 17:25:17 PST
*** Bug 37941 has been marked as a duplicate of this bug. ***
Comment 17 Brant Gurganus 2002-10-13 11:31:56 PDT
[RFE] is deprecated in favor of severity: enhancement.  They have the same meaning.
Comment 18 Daniel Wang 2002-11-02 21:18:44 PST
*** Bug 177034 has been marked as a duplicate of this bug. ***
Comment 19 Boris Zbarsky [:bz] (still a bit busy) 2003-03-14 20:45:32 PST
*** Bug 197470 has been marked as a duplicate of this bug. ***
Comment 20 Jo Hermans 2003-05-20 04:10:11 PDT
*** Bug 206404 has been marked as a duplicate of this bug. ***
Comment 21 Malx 2003-05-20 04:44:39 PDT
 Akkana: see Bug #206404
I do not thing it is duplicate of this bug.
This one mostly about "regexp" and at most "list of found word".
My is about "grep"-ing (grep result contains _all string_ with the keyword).
Comment 22 R.K.Aa. 2004-05-27 04:14:14 PDT
*** Bug 244835 has been marked as a duplicate of this bug. ***
Comment 23 Met - Martin Hassman 2004-05-28 02:53:52 PDT
Some comment to the discussion wildcard vs. regulat expression.

Most of the duplicates of this bug wanted regular expression. I am voting for
regular expression instead of for wildcards too 8-)

Look at the searching in the OpenOffice.org Writer (or even MS Word). They
supports regular expression too. The user can search page by simple way (like it
is now) and when he wants more, he can use regular expression.

I thing these too possibilities "similar search and regular expression search"
are sufficient enough and there is no need to make third way for searching -
searching with wildcards, which is - from the performance view - someting
between similar search end regexp.

So I think we should change topic to "Regular expression searching".
Comment 24 Brian 'netdragon' Bober 2004-05-28 03:12:48 PDT
Regexp are evil for the average user. How about a checkbox:

[ ] Regular Expression search
Comment 25 Brian 'netdragon' Bober 2004-05-28 03:14:53 PDT
User wants to do all*run to find "all ducks run", "all cats run". That as a
regexp would be all.*?run or something like that. An average Windoze user will
not know how to do that stuff. Regexp can be an option in the search box, but
shouldn't be a default.
Comment 26 Met - Martin Hassman 2004-05-28 03:31:07 PDT
(In reply to comment #24, #25)
>An average Windoze user will not know how to do that stuff

Yes. For the average user there is simple search, for experienced regular
expression. See OpenOffice.org, see MS Word, thay have it in the same way.

> Regexp are evil for the average user. How about a checkbox:
> [ ] Regular Expression search

Yes, I think so.
I have described it in the detail in the bug #244835
I will paste it here:

----
There would be nice to have a regular expression possibility in the searching in
the page (this can concern not only Browser, but Editor and Mail too).

The dialog "Find on the page" should have 4th checkbox "Regular expression" and
when it is checked the searching will use regular expression search.

MS Word and OpenOffice.org Writer have it so! Well, most users are using
classical search (simple text), but on the search dialog of MS Word and
OpenOffice.org Writer, there is checkbox for regular expression search, so
experienced users can use it 8-)
There would be nice to have a regular expression possibility in the searching in
the page (this can concern not only Browser, but Editor and Mail too).

The dialog "Find on the page" should have 4th checkbox "Regular expression" and
when it is checked the searching will use regular expression search.

MS Word and OpenOffice.org Writer have it so! Well, most users are using
classical search (simple text), but on the search dialog of MS Word and
OpenOffice.org Writer, there is checkbox for regular expression search, so
experienced users can use it 8-)
Comment 27 Malx 2004-05-28 03:39:55 PDT
Agree - regexp should be an option in the dialog or in the preferences of mozilla.
And in addition to "list all found words" I whould suggest "hilight all found
words" same as "less" do.  You just see all of them on page. 

I whould propose "zoom to found" but it requires more complex programming. That
is if you press "zoom" all blocks exept of with found words are set to hidden.
The poblem is - what to call a block :) For table it whould be table row. For
lists - the LI. For text - paragraph. Etc.
Comment 28 Met - Martin Hassman 2004-05-28 03:51:45 PDT
(In reply to comment #27)

> The poblem is - what to call a block :) For table it whould be table row. For
> lists - the LI. For text - paragraph. Etc.

No table row, because tables are still used to page layout instead of as table
of data, so there is a problem. Inspiration is on OpenOffice see below, copy of
other part of the #244835 ;-)

/////////
There are some questions how to implement some more problematic
cases like:

a) Moz<b>illa</b> <- should this be handled as Mozilla (on word) or two words
Moz and illa.

b) How to handle ^ and $ - probaly as the begining and the end of the paragraph
(OpenOffice.org has it so) - but what exactly is a paragraph? Only the text
between <p> and </p> tags? Or everything, what have display:block (which is
probably better)? 
/////////
Comment 29 Brian 'netdragon' Bober 2004-05-28 07:48:01 PDT
Re comment #28

a) Two words

b) Hmmmm... What do Openoffice and MS Word do? We'd probably want to cut it off
in each block or <br>


Implementation:
Perhaps the best way to implement simple search would be to convert it
transparently into a regexp, then we only need one search back-end. This would
also allow the regexp code to be thoroughly tested.

Simple search possibliites:

* wildcard would be replaced with regexp to match any length string
? would be replaced with regexp to match single character
quoting would be implicit... "the dog" would return only "the dog", not "dog the"
AND and OR would be allowed as modifiers, they would also be converted into the
regexp equivalents.
Comment 30 Robert Kaiser 2009-06-14 07:03:19 PDT
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
Comment 31 Robert Kaiser 2009-06-14 07:08:41 PDT
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
Comment 32 Robert Kaiser 2009-06-14 07:14:35 PDT
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
Comment 33 Robert Kaiser 2009-06-14 12:05:42 PDT
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
Comment 34 Robert Kaiser 2009-06-14 15:21:21 PDT
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
Comment 35 Robert Kaiser 2009-06-14 15:44:53 PDT
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
Comment 36 Robert Kaiser 2009-06-14 15:50:56 PDT
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
Comment 37 Mark Macdonald 2010-03-16 12:21:57 PDT
*** This bug has been confirmed by popular vote. ***
Comment 38 chAlx 2013-06-18 08:48:21 PDT
IMO this bug was not handled properly due to very unclear initial comment. Maybe it should be closed and recreated in a new modern way.

What people need is some abilities to extend search options:

1. Use alternatives: (regex|regular)
2. Use masks: 19[7-9][0-9]
3. Check context, including nearby words: (?<!sea)monkey [a-z]{0,20} banana

All regexp specials like different whitespaces, modifiers, replacing, grouping, recursion etc are too exotic for common use. But regexps are industrial standard for advanced search and gives almost "all one can want" in the only well-maintained library. So all possible options could be implemented at once via regexp search.

Note You need to log in before you can comment on or make changes to this bug.