Open Bug 82054 Opened 24 years ago Updated 2 years ago

FindInReadable is O(m*n)

Categories

(Core :: XPCOM, defect, P3)

x86
All
defect

Tracking

()

Future

People

(Reporter: dbaron, Unassigned)

References

Details

Attachments

(6 files)

As the comment in nsReadableUtils.h says, FindInReadable is potentially O(N*M)
since it just tests for a match at each point along the segment.  I started
digging around in nsReadableUtils.cpp today to try to implement
CaseInsensitiveRFindInReadable, got distracted (and discouraged since the best
way to do what I want is to implement reverse iterators, which isn't something
that could be done in a few hours), and ended up hacking on this for a bit (and
a bit longer than I intended).  I'm sure there's a better approach, but I did
come up with something.  I'm not sure whether this is something we'd want.  If
it is, it needs a lot more testing...

Anyway, I'll stick the patch on this bug.  Any thoughts?
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → mozilla0.9.3
I would guess that most of the time, we search for small substrings (four
characters or less) in small strings (64 characters or less). In this case, I
would guess that setting up a DFA to do the search may well dwarf the cost of
doing the search itself, and would be beaten by brute force.

Does this patch have any noticable negative effect on the gross tests we run?
(jrgm's page loader or iBench, for example?) I'd suspect that this would be the
only way we'd be able to notice the small but pervasive tax we'd be paying to
set up the DFA.

scc and I discussed this when he was out last, and he was considering allowing
search to support a ``pluggable algorithm''...
for our new string classes, don't we know m and n before we start searching? in 
which case we should be able to easily pick some threshold -- of course we'd 
have to decide what we want it to be...
The construction of the DFA shouldn't dwarf the cost of searching since it's not
a full-fledged DFA, but a pseudo-DFA that's much simpler to construct.
Oops, that patch had a typo that caused none of the templates to be instantiated.
I only needed to fix one of the bugs in nsCharTraits.h (relating to not handling
reverse iterators correctly), and I did that in a somewhat ugly way.

The above patch also includes a test program that makes 120 test assertions. 
The first 6 fail, all of the following 114 pass.  The first 6 relate to what
should happen when we search for an empty string.  What *should* happen?
Nits: else after return non-sequitur, and gratuitous (-n) parentheses:

+        if (SameFragment(first, last)) {
+          PRInt32 n = last.get() - first.get();
+          return PRUint32((n>0) ? n : (-n));
+        } else {
+          return first.size_forward();
+        }

More tomorrow (need sleep), but one thought on empty matching: Perl and JS "bump
along" the index at which to resume when matching a global regular expression
that can match the empty string, so that the regexp (or the regexp plus its
target string, in Perl5) form an iterator.  For a flat string search, JS matches
'' at the beginning of the target string ("hello".indexOf('') returns 0;
"hello".indexOf(3) returns 3).  Hope this helps,

/be
I'm applying your patch so I can build it and examine it in situ.  Off the bat,
though, I might have fixed |readable_distance| with a judiciously placed |abs()|
or the like; and I'm leary of the normalization happening in your |RBegin..|s,
as that's what's stopping |copy_string| and friends from working for you.  It
needs some thinking about.
Why is the normalization breaking |copy_string|?  The problem seems to me that
|copy_string| expects |read| and |readable_distance| to return a pointer and a
length forward from that pointer.

One bit of the current code that would be impossible to duplicate with reverse
iterators is the part that assures that copy_string always works on overlapping
buffers (consider trying to use it to reverse a string).  Other than that it
doesn't seem too hard to make it work with reverse iterators.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Priority: P1 → P3
Target Milestone: mozilla0.9.5 → mozilla0.9.7
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Target Milestone: mozilla0.9.9 → Future
QA Contact: scc → string
Component: String → XPCOM
Assignee: dbaron → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: