Search function is broken when one string is a part of another and they overlap
Categories
(Core :: Find Backend, defect)
Tracking
()
People
(Reporter: osowiecki, Assigned: jjaschke)
Details
Attachments
(3 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0
Steps to reproduce:
Open such .txt file or html in firefox :
Substring: 'AGTCCCTCAAG (forward) (length: 11)'
Occurs 2 times forward in the target sequence
Forward occurrence in reference sequence: 1
Reversed occurrence in reference sequence: 0
Total occurrences in reference sequence: 1
Unmasked target sequence: AGTCCCTCAAGTCCCTCAAGCCGCCACCGCCGCC
Unmasked target sequence: AGTCCCTCA
AGTCCCTCAAGCCGCCACCGCCGCC
Masked target sequence: ***********TCCCTCAAGCCGCCACCGCCGCC
Unmasked reference sequence: AGTCCCTCAAGT
Masked reference sequence: ***********T
Actual results:
"AGTCCCTCAAG" is only found once.
Expected results:
"AGTCCCTCAAG" occurs twice in the
Unmasked target sequence: AGTCCCTCAAGTCCCTCAAGCCGCCACCGCCGCC
#first occurrence
"AGTCCCTCAAG"
TCCCTCAAGCCGCCACCGCCGCC
#second occurrence
AGTCCCTCA
"AGTCCCTCAAG"
CCGCCACCGCCGCC
![]() |
||
Updated•11 months ago
|
![]() |
||
Comment 1•11 months ago
|
||
I can reproduce the issue on Nightly131.0a1 Windows11.
Seems like multiple open-source projects are affected. I saw the same behaviour in XED, Geany and LibreOffice.
Updated•11 months ago
|
Assignee | ||
Comment 3•11 months ago
|
||
Simplified test case: document content abababab
and search string aba
. Expected would be 3 matches, only found 2. It looks like search for the next result begins at the end of the previous match instead of starting one character after the previous match's start.
Assignee | ||
Comment 4•11 months ago
|
||
Chrome shows same behavior, Safari even only finds one occurrence.
Emilio, afaik there is no spec for find-on-page. Since none of the browsers show the behavior expected in comment 0 (and the behavior I would have expected), what do you think about this, is this something we should fix?
![]() |
||
Comment 5•11 months ago
|
||
- Open data:text/html,ababab<br>aba
- Ctrl+F, type aba
- Click NEXT
- Click PREVIOUS
Matching parts are different between 'NEXT' and 'PREVIOUS'.
I think they should be the same.
Assignee | ||
Comment 6•11 months ago
|
||
Thanks, yeah that's definitely a bug. That also doesn't repro in Chrome or Safari.
Comment 7•11 months ago
|
||
Seems worth fixing, even if relatively obscure... Might be worth checking if this is a regression, but my guess from how the find code works is that it is not.
I think the bug is around here.
We start the search at the end of the current range (when forward, opposite when searching backwards).
It seems instead we should start at the next character past the beginning of the range (or something like that?). Something like "start at the current start, but make sure to skip one character before starting the match" (because otherwise we'd infinitely loop).
Assignee | ||
Comment 8•11 months ago
|
||
Not sure -- isn't the purpose of nsFind
to just find the first occurrence of the pattern in a range and then return? If I understand correctly, your proposal would make nsfind not find the first match, because it would start by skipping the first character. The signature of nsFind::Find()
with all these ranges is super-weird though, it feels redundant to have the whole search range, the start point (which is a collapsed range) and the end point (which is also a collapsed range). Would be easier to understand if there would only be the search range, which would have its boundaries moved to the start/end points.
However, I think we would need to update the callers. Here for example nsfind is called in a loop, and the start point is the last result collapsed to the end. I think this needs to be updated to be "one character behind the range's start", I assume we don't have a helper for that. :) Tried it out briefly, looks like there must be more places. Fixing this only corrected the number of results, but F3'ing through the results doesn't find all matches.
Comment 9•11 months ago
|
||
(In reply to Jan Jaeschke [:jjaschke] from comment #4)
Emilio, afaik there is no spec for find-on-page.
There isn't, but long-term, I think we should work on standardizing an extension of the basic idea of what Gecko does instead of baking collator-based search into specs. See https://github.com/tc39/ecma402/blob/main/meetings/notes-2024-05-23.md#add-locale-sensitive-substring-matching-functions-to-intlcollator
Assignee | ||
Comment 10•11 months ago
|
||
So, I could manage to write a fix for this. However, I'm not sure about UX:
STR:
- Open data:text/html,ababab<br>aba
- Ctrl+F, type aba
- Select "Highlight All"
When "Highlight All" is selected, the first two results are shown as one "ababa" range (see attached image). Maybe it would be nicer if that were somewhat separated?
Assignee | ||
Comment 11•11 months ago
|
||
Updated•11 months ago
|
Assignee | ||
Comment 12•11 months ago
|
||
Here's a try run.
Assignee | ||
Comment 13•11 months ago
|
||
Looks like there are a few other tests affected.
Description
•