Closed
Bug 37941
Opened 24 years ago
Closed 22 years ago
[RFE] Regular Expression Searches
Categories
(SeaMonkey :: UI Design, enhancement, P3)
SeaMonkey
UI Design
Tracking
(Not tracked)
Future
People
(Reporter: danielpeng, Assigned: sfraser_bugs)
References
Details
(Keywords: helpwanted)
Find sections in a page based on a regular expression.
Comment 1•24 years ago
|
||
->sfraser
Assignee: don → sfraser
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 4•24 years ago
|
||
moving back to previous owner
Assignee: beppe → sfraser
Target Milestone: M20 → Future
Comment 6•23 years ago
|
||
I've been thinking of working on something along these lines. Any ideas on how to do regexp stuff in C++ (since the search is currently in C++)?
Assignee | ||
Comment 7•23 years ago
|
||
You'll need a RegExp implementation. The JS engine has one, but I don't believe it's exposed; doing that would probably require some refactoring.
Who needs it in C++? We're all about XPCOM, baby! You need an nsIRegExp interface, a tiny JS implementation, and wham-o: Henry Spencer's your uncle!
Comment 9•23 years ago
|
||
The JS RegExp stuff can be exposed. Some of it is. You'll need a JSContext*, which you can get from an nsIScriptContext, which you can get from a DOM global object. Where would the call(s) to compile and test a regexp come from? Here's a question: does the search have to convert the entire document into a string to find a match? Or is there an iterator that can be used character by character? /be
Comment 10•23 years ago
|
||
*** Bug 118507 has been marked as a duplicate of this bug. ***
Comment 11•23 years ago
|
||
mass moving open bugs pertaining to find in page/frame to pmac@netscape.com as qa contact. to find all bugspam pertaining to this, set your search string to "AppleSpongeCakeWithCaramelFrosting".
QA Contact: sairuh → pmac
Comment 12•22 years ago
|
||
I have bug 32641, to implement simple wildcard searching. The request in that bug asks specifically that full regexps NOT be implemented. We should choose one of these two options, and resolve one of these bugs as a dup of the other. I should probably own the resulting bug, unless Simon specifically wants this one. Simon (or anyone on the cc list), what do you think? Wildcard or regexp? And please feel free to reassign this one to me, or dup it to 32641. Unfortunately, we can't just plug in a regexp library, as our searches have to be able to span multiple DOM nodes while iterating backward and forward in the dom and skipping over invisible nodes.
Comment 13•22 years ago
|
||
I have to say that I find regexp searching infinitely more useful than wildcard searching. I also agree (unfortunately) that Brian has a point in bug 32641 -- wildcard searching may be a lot more likely to be understandable to the average "power user" who is used to dealing with ? and * in shells. I was wondering whether it would be possible to make the code flexible enough that the matching engine could be swapped out (so that someone could write a "regexp search" xpi) while keeping it fast (and whether we really care, I guess).
Comment 14•22 years ago
|
||
I've been thinking some more.... For a typical document with text (which is what one mostly views with a web browser) the only wildcard that's really useful is "?"... "*" would have a strong tendency to match something like half the document (And it sounds like bug 32641 is asking for "*foo*" to act as the "\b\w*foo\w*\b" regexp, which is, imo, not at all intuitive even for someone used to wildcards.) Also, on Unix regular expression searches are the standard for tools that manipulate text and allow searching -- wildcards are only really used by shells.
Comment 15•22 years ago
|
||
> Unfortunately, we can't just plug in a regexp library,
> as our searches have to be able to span multiple DOM
> nodes while iterating backward and forward in the
> dom and skipping over invisible nodes.
How does the literal text matching engine do it?
This might be Too Much Bloat, but what if a search command caused a plain text
version to be assembled on the fly, with an associated table of relations
between positions in the plain text version and location in the document? The
table would not need to be complete, only enough to be able to construct
information to hilight "this much off the end of that node, these nodes, and
that much off the start of the next node".
The text equivilant and table would be built only when the search was initiated
(and cached until some DHTML whatzit rendered the table obselete). The plain
text could be searched by any regex library which could return start/end of
match indicies which would then be translated into a useful result relative to
the actual page.
If the literal text search doesn't already do something like this, it *could* be
doing something like this, and the whole thing could be very pluggable. One
could have different options for search: literal text, wildcards, regular
expressions, soundex, whatever.
-matt
Assignee | ||
Comment 16•22 years ago
|
||
akkana: feel free to take this. I'd also strongly recommend that you future it :)
Comment 17•22 years ago
|
||
I'm going to dup this to bug 32641. Those arguing for regexps rather than wildcards, discuss it there where the pro-wildcard folks are. Boris: > I was wondering whether it would be possible to make the code flexible enough > that the matching engine could be swapped out Unfortunately, not, it isn't really possible: m_mozilla: > what if a search command caused a plain text > version to be assembled on the fly That's what the previous version of find did, and that's why it was up to an order of magnitude slower on big documents. It's not reasonable on big documents. Part of the problem was that it had to be redone for every search, because we have no way of knowing whether the document changed since the last search. In various attempts at rewriting this code, I tried several different approaches involving combining text from several text nodes together and then calling the built-in searches in our string classes (which would also have allowed for calling regexp comparisons), but the result wasn't fast enough, and I never came up with a satisfactory answer to the question of "How do you determine how many nodes you have to convert to plaintext before you have enough to call the pattern search on it?" I suppose you could just keep building the string as you iterate through the document, re-doing the regexp search each time. *** This bug has been marked as a duplicate of 32641 ***
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
Updated•20 years ago
|
Product: Core → Mozilla Application Suite
You need to log in
before you can comment on or make changes to this bug.
Description
•