Open Bug 1912664 Opened 11 months ago Updated 11 months ago

Search function is broken when one string is a part of another and they overlap

Categories

(Core :: Find Backend, defect)

Firefox 128
defect

Tracking

()

ASSIGNED
Tracking Status
firefox-esr115 --- wontfix
firefox-esr128 --- affected
firefox129 --- wontfix
firefox130 --- affected
firefox131 --- affected

People

(Reporter: osowiecki, Assigned: jjaschke)

Details

Attachments

(3 files)

Attached file example

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0

Steps to reproduce:

Open such .txt file or html in firefox :

Substring: 'AGTCCCTCAAG (forward) (length: 11)'
Occurs 2 times forward in the target sequence
Forward occurrence in reference sequence: 1
Reversed occurrence in reference sequence: 0
Total occurrences in reference sequence: 1
Unmasked target sequence: AGTCCCTCAAGTCCCTCAAGCCGCCACCGCCGCC
Unmasked target sequence: AGTCCCTCA
AGTCCCTCAAGCCGCCACCGCCGCC
Masked target sequence: ***********TCCCTCAAGCCGCCACCGCCGCC
Unmasked reference sequence: AGTCCCTCAAGT
Masked reference sequence: ***********T

Actual results:

"AGTCCCTCAAG" is only found once.

Expected results:

"AGTCCCTCAAG" occurs twice in the
Unmasked target sequence: AGTCCCTCAAGTCCCTCAAGCCGCCACCGCCGCC

#first occurrence
"AGTCCCTCAAG"
TCCCTCAAGCCGCCACCGCCGCC

#second occurrence
AGTCCCTCA
"AGTCCCTCAAG"
CCGCCACCGCCGCC

Component: Untriaged → Find Backend
Product: Firefox → Core

I can reproduce the issue on Nightly131.0a1 Windows11.

Status: UNCONFIRMED → NEW
Ever confirmed: true

Seems like multiple open-source projects are affected. I saw the same behaviour in XED, Geany and LibreOffice.

Severity: -- → S3

Simplified test case: document content abababab and search string aba. Expected would be 3 matches, only found 2. It looks like search for the next result begins at the end of the previous match instead of starting one character after the previous match's start.

Chrome shows same behavior, Safari even only finds one occurrence.

Emilio, afaik there is no spec for find-on-page. Since none of the browsers show the behavior expected in comment 0 (and the behavior I would have expected), what do you think about this, is this something we should fix?

Flags: needinfo?(emilio)
  1. Open data:text/html,ababab<br>aba
  2. Ctrl+F, type aba
  3. Click NEXT
  4. Click PREVIOUS

Matching parts are different between 'NEXT' and 'PREVIOUS'.
I think they should be the same.

Thanks, yeah that's definitely a bug. That also doesn't repro in Chrome or Safari.

Seems worth fixing, even if relatively obscure... Might be worth checking if this is a regression, but my guess from how the find code works is that it is not.

I think the bug is around here.

We start the search at the end of the current range (when forward, opposite when searching backwards).

It seems instead we should start at the next character past the beginning of the range (or something like that?). Something like "start at the current start, but make sure to skip one character before starting the match" (because otherwise we'd infinitely loop).

Flags: needinfo?(emilio)

Not sure -- isn't the purpose of nsFind to just find the first occurrence of the pattern in a range and then return? If I understand correctly, your proposal would make nsfind not find the first match, because it would start by skipping the first character. The signature of nsFind::Find() with all these ranges is super-weird though, it feels redundant to have the whole search range, the start point (which is a collapsed range) and the end point (which is also a collapsed range). Would be easier to understand if there would only be the search range, which would have its boundaries moved to the start/end points.

However, I think we would need to update the callers. Here for example nsfind is called in a loop, and the start point is the last result collapsed to the end. I think this needs to be updated to be "one character behind the range's start", I assume we don't have a helper for that. :) Tried it out briefly, looks like there must be more places. Fixing this only corrected the number of results, but F3'ing through the results doesn't find all matches.

(In reply to Jan Jaeschke [:jjaschke] from comment #4)

Emilio, afaik there is no spec for find-on-page.

There isn't, but long-term, I think we should work on standardizing an extension of the basic idea of what Gecko does instead of baking collator-based search into specs. See https://github.com/tc39/ecma402/blob/main/meetings/notes-2024-05-23.md#add-locale-sensitive-substring-matching-functions-to-intlcollator

Attached image image.png

So, I could manage to write a fix for this. However, I'm not sure about UX:

STR:

  1. Open data:text/html,ababab<br>aba
  2. Ctrl+F, type aba
  3. Select "Highlight All"

When "Highlight All" is selected, the first two results are shown as one "ababa" range (see attached image). Maybe it would be nicer if that were somewhat separated?

Assignee: nobody → jjaschke
Status: NEW → ASSIGNED

Here's a try run.

Looks like there are a few other tests affected.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: