Open Bug 82054 Opened 24 years ago Updated 3 years ago

FindInReadable is O(m*n)

Tracking

()

Status:

NEW

Milestone:

Future

People

(Reporter: dbaron, Unassigned)

References

Details

Attachments

(6 files)

algorithm I came up with (I should really look for a better one...) 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 10.86 KB, patch		Details \| Diff \| Splinter Review
patch with reverse iterators and RFind* too (untested, but compiles on Windows) 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 37.81 KB, patch		Details \| Diff \| Splinter Review
version of above that really compiles on Windows 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 41.99 KB, patch		Details \| Diff \| Splinter Review
above, plus fix to nsCharTraits.h, plus tests 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 62.17 KB, patch		Details \| Diff \| Splinter Review
compiles warning-free on VC6 and gcc 3.0, still struggling with gcc 2.7.2.3 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 69.88 KB, patch		Details \| Diff \| Splinter Review
also compiles on gcc 2.7.2.3 (split some stuff into nsStringIterator2.h) 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 75.56 KB, patch		Details \| Diff \| Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Description

•

24 years ago

As the comment in nsReadableUtils.h says, FindInReadable is potentially O(N*M) since it just tests for a match at each point along the segment. I started digging around in nsReadableUtils.cpp today to try to implement CaseInsensitiveRFindInReadable, got distracted (and discouraged since the best way to do what I want is to implement reverse iterators, which isn't something that could be done in a few hours), and ended up hacking on this for a bit (and a bit longer than I intended). I'm sure there's a better approach, but I did come up with something. I'm not sure whether this is something we'd want. If it is, it needs a lot more testing... Anyway, I'll stick the patch on this bug. Any thoughts?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 1

•

24 years ago

Attached patch algorithm I came up with (I should really look for a better one...) — Details — Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

24 years ago

Status: NEW → ASSIGNED

Priority: -- → P2

Target Milestone: --- → mozilla0.9.3

Chris Waterson

Comment 2

•

24 years ago

I would guess that most of the time, we search for small substrings (four characters or less) in small strings (64 characters or less). In this case, I would guess that setting up a DFA to do the search may well dwarf the cost of doing the search itself, and would be beaten by brute force. Does this patch have any noticable negative effect on the gross tests we run? (jrgm's page loader or iBench, for example?) I'd suspect that this would be the only way we'd be able to notice the small but pervasive tax we'd be paying to set up the DFA. scc and I discussed this when he was out last, and he was considering allowing search to support a ``pluggable algorithm''...

timeless

Comment 3

•

24 years ago

for our new string classes, don't we know m and n before we start searching? in which case we should be able to easily pick some threshold -- of course we'd have to decide what we want it to be...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 4

•

24 years ago

The construction of the DFA shouldn't dwarf the cost of searching since it's not a full-fledged DFA, but a pseudo-DFA that's much simpler to construct.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 5

•

24 years ago

Attached patch patch with reverse iterators and RFind* too (untested, but compiles on Windows) — Details — Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 6

•

24 years ago

Oops, that patch had a typo that caused none of the templates to be instantiated.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 7

•

24 years ago

Attached patch version of above that *really* compiles on Windows — Details — Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 8

•

24 years ago

Attached patch above, plus fix to nsCharTraits.h, plus tests — Details — Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 9

•

24 years ago

I only needed to fix one of the bugs in nsCharTraits.h (relating to not handling reverse iterators correctly), and I did that in a somewhat ugly way. The above patch also includes a test program that makes 120 test assertions. The first 6 fail, all of the following 114 pass. The first 6 relate to what should happen when we search for an empty string. What *should* happen?

Brendan Eich [:brendan]

Comment 10

•

24 years ago

Nits: else after return non-sequitur, and gratuitous (-n) parentheses: + if (SameFragment(first, last)) { + PRInt32 n = last.get() - first.get(); + return PRUint32((n>0) ? n : (-n)); + } else { + return first.size_forward(); + } More tomorrow (need sleep), but one thought on empty matching: Perl and JS "bump along" the index at which to resume when matching a global regular expression that can match the empty string, so that the regexp (or the regexp plus its target string, in Perl5) form an iterator. For a flat string search, JS matches '' at the beginning of the target string ("hello".indexOf('') returns 0; "hello".indexOf(3) returns 3). Hope this helps, /be

Scott Collins

Comment 11

•

24 years ago

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 12

•

24 years ago

Why is the normalization breaking |copy_string|? The problem seems to me that |copy_string| expects |read| and |readable_distance| to return a pointer and a length forward from that pointer. One bit of the current code that would be impossible to duplicate with reverse iterators is the part that assures that copy_string always works on overlapping buffers (consider trying to use it to reverse a string). Other than that it doesn't seem too hard to make it work with reverse iterators.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 13

•

24 years ago

Attached patch compiles warning-free on VC6 and gcc 3.0, still struggling with gcc 2.7.2.3 — Details — Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

24 years ago

Blocks: 77460

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 14

•

24 years ago

Attached patch also compiles on gcc 2.7.2.3 (split some stuff into nsStringIterator2.h) — Details — Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

24 years ago

Priority: P2 → P1

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

24 years ago

Target Milestone: mozilla0.9.3 → mozilla0.9.4

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

24 years ago

Target Milestone: mozilla0.9.4 → mozilla0.9.5

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

24 years ago

Priority: P1 → P3

Target Milestone: mozilla0.9.5 → mozilla0.9.7

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

24 years ago

Target Milestone: mozilla0.9.7 → mozilla0.9.8

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

24 years ago

Target Milestone: mozilla0.9.8 → mozilla0.9.9

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

24 years ago

Target Milestone: mozilla0.9.9 → Future

Phil Ringnalda (:philor)

Updated

•

16 years ago

QA Contact: scc → string

Nobody; OK to take it and work on it

Assignee

Updated

•

5 years ago

Component: String → XPCOM

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

5 years ago

Assignee: dbaron → nobody

Status: ASSIGNED → NEW

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

algorithm I came up with (I should really look for a better one...) 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 10.86 KB, patch		Details \| Diff \| Splinter Review
patch with reverse iterators and RFind* too (untested, but compiles on Windows) 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 37.81 KB, patch		Details \| Diff \| Splinter Review
version of above that really compiles on Windows 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 41.99 KB, patch		Details \| Diff \| Splinter Review
above, plus fix to nsCharTraits.h, plus tests 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 62.17 KB, patch		Details \| Diff \| Splinter Review
compiles warning-free on VC6 and gcc 3.0, still struggling with gcc 2.7.2.3 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 69.88 KB, patch		Details \| Diff \| Splinter Review
also compiles on gcc 2.7.2.3 (split some stuff into nsStringIterator2.h) 24 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 75.56 KB, patch		Details \| Diff \| Splinter Review