Open
Bug 82054
Opened 24 years ago
Updated 2 years ago
FindInReadable is O(m*n)
Categories
(Core :: XPCOM, defect, P3)
Tracking
()
NEW
Future
People
(Reporter: dbaron, Unassigned)
References
Details
Attachments
(6 files)
10.86 KB,
patch
|
Details | Diff | Splinter Review | |
37.81 KB,
patch
|
Details | Diff | Splinter Review | |
41.99 KB,
patch
|
Details | Diff | Splinter Review | |
62.17 KB,
patch
|
Details | Diff | Splinter Review | |
69.88 KB,
patch
|
Details | Diff | Splinter Review | |
75.56 KB,
patch
|
Details | Diff | Splinter Review |
As the comment in nsReadableUtils.h says, FindInReadable is potentially O(N*M)
since it just tests for a match at each point along the segment. I started
digging around in nsReadableUtils.cpp today to try to implement
CaseInsensitiveRFindInReadable, got distracted (and discouraged since the best
way to do what I want is to implement reverse iterators, which isn't something
that could be done in a few hours), and ended up hacking on this for a bit (and
a bit longer than I intended). I'm sure there's a better approach, but I did
come up with something. I'm not sure whether this is something we'd want. If
it is, it needs a lot more testing...
Anyway, I'll stick the patch on this bug. Any thoughts?
Reporter | ||
Comment 1•24 years ago
|
||
Reporter | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → mozilla0.9.3
Comment 2•23 years ago
|
||
I would guess that most of the time, we search for small substrings (four
characters or less) in small strings (64 characters or less). In this case, I
would guess that setting up a DFA to do the search may well dwarf the cost of
doing the search itself, and would be beaten by brute force.
Does this patch have any noticable negative effect on the gross tests we run?
(jrgm's page loader or iBench, for example?) I'd suspect that this would be the
only way we'd be able to notice the small but pervasive tax we'd be paying to
set up the DFA.
scc and I discussed this when he was out last, and he was considering allowing
search to support a ``pluggable algorithm''...
for our new string classes, don't we know m and n before we start searching? in
which case we should be able to easily pick some threshold -- of course we'd
have to decide what we want it to be...
Reporter | ||
Comment 4•23 years ago
|
||
The construction of the DFA shouldn't dwarf the cost of searching since it's not
a full-fledged DFA, but a pseudo-DFA that's much simpler to construct.
Reporter | ||
Comment 5•23 years ago
|
||
Reporter | ||
Comment 6•23 years ago
|
||
Oops, that patch had a typo that caused none of the templates to be instantiated.
Reporter | ||
Comment 7•23 years ago
|
||
Reporter | ||
Comment 8•23 years ago
|
||
Reporter | ||
Comment 9•23 years ago
|
||
I only needed to fix one of the bugs in nsCharTraits.h (relating to not handling
reverse iterators correctly), and I did that in a somewhat ugly way.
The above patch also includes a test program that makes 120 test assertions.
The first 6 fail, all of the following 114 pass. The first 6 relate to what
should happen when we search for an empty string. What *should* happen?
Comment 10•23 years ago
|
||
Nits: else after return non-sequitur, and gratuitous (-n) parentheses:
+ if (SameFragment(first, last)) {
+ PRInt32 n = last.get() - first.get();
+ return PRUint32((n>0) ? n : (-n));
+ } else {
+ return first.size_forward();
+ }
More tomorrow (need sleep), but one thought on empty matching: Perl and JS "bump
along" the index at which to resume when matching a global regular expression
that can match the empty string, so that the regexp (or the regexp plus its
target string, in Perl5) form an iterator. For a flat string search, JS matches
'' at the beginning of the target string ("hello".indexOf('') returns 0;
"hello".indexOf(3) returns 3). Hope this helps,
/be
Comment 11•23 years ago
|
||
I'm applying your patch so I can build it and examine it in situ. Off the bat,
though, I might have fixed |readable_distance| with a judiciously placed |abs()|
or the like; and I'm leary of the normalization happening in your |RBegin..|s,
as that's what's stopping |copy_string| and friends from working for you. It
needs some thinking about.
Reporter | ||
Comment 12•23 years ago
|
||
Why is the normalization breaking |copy_string|? The problem seems to me that
|copy_string| expects |read| and |readable_distance| to return a pointer and a
length forward from that pointer.
One bit of the current code that would be impossible to duplicate with reverse
iterators is the part that assures that copy_string always works on overlapping
buffers (consider trying to use it to reverse a string). Other than that it
doesn't seem too hard to make it work with reverse iterators.
Reporter | ||
Comment 13•23 years ago
|
||
Reporter | ||
Comment 14•23 years ago
|
||
Reporter | ||
Updated•23 years ago
|
Priority: P2 → P1
Reporter | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Reporter | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Reporter | ||
Updated•23 years ago
|
Priority: P1 → P3
Target Milestone: mozilla0.9.5 → mozilla0.9.7
Reporter | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Reporter | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Reporter | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.9 → Future
Updated•15 years ago
|
QA Contact: scc → string
Assignee | ||
Updated•4 years ago
|
Component: String → XPCOM
Reporter | ||
Updated•4 years ago
|
Assignee: dbaron → nobody
Status: ASSIGNED → NEW
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•