Open Bug 1904970 Opened 8 months ago Updated 8 months ago

PDF viewer find-in-page unexpectedly jumps from one match to the next (instead of extending the current match) as you type additional characters, in certain PDFs

Categories

(Firefox :: PDF Viewer, defect, P3)

defect

Tracking

()

People

(Reporter: dholbert, Unassigned)

Details

Attachments

(4 files)

Attach (recommended) or Link to PDF file here:
https://www.adobe.com/support/products/enterprise/knowledgecenter/media/c4611_sample_explain.pdf
(also attached)

Steps to reproduce the problem:

  1. Load the testcase.
  2. Ctrl+F to activate Find-in-page UI.
  3. Type "pdf" one character at a time, watching carefully what happens as you type each character.

What is the expected behavior?
The token "PDF" at the bottom left of page 1 should be the matched string (and should remain the matched string as you type the full "pdf" token)

What went wrong?
With each character, Firefox jumps to some a match on the following page! So at the end of the STR, when you've typed "PDF" (3 characters), it's highlighting a match on page 3 instead of on page 1.

You can reproduce this with other words in the PDF as well, e.g. the word "bookmark".

Random guess at diagnosis: Maybe the geometry of this PDF file is somehow messed up, such that when we scroll to show the match, find-in-page thinks that the match is scrolled off the top of the screen, and so the next character forces us to jump to the next match that we can find?

Another set of steps to reproduce a related issue:
Same as comment 0, but instead type "data" (instead of "PDF").

EXPECTED RESULTS:

  • When I've typed "da" and "dat", it should match "Sample Date" at the top of page 1
  • When I've typed "data", it should match the word "data" in the second sentence of the "Overview" section on page 1 ("The data file ...")

ACTUAL RESULTS:
The first character "d" matches the word "PDF" at bottom left of page 1, and so we scroll to show that, and that seems to prevent us from finding the matches that are further up in the document from there, so we end up highlighting "Data" (in the "Sample Data File" heading) at the top of page 2, instead of finding the matches for "da"/"dat"/"data" on page 1.

Summary: PDF viewer find-in-page unexpectedly jumps from one match to the next as you type additional characters, in certain PDFs → PDF viewer find-in-page unexpectedly jumps from one match to the next (instead of extending the current match) as you type additional characters, in certain PDFs
Attachment #9409804 - Attachment description: screencast of bug → screencast of "da" jumping to a match on page 2 instead of page 1

The "pdf" at the bottom of the first page is the first text element in this page it's why it's the first match.
But then when scrolling to put this "pdf" on top of the viewer, the current page becomes the second one and finally the next match is searched on the current page, etc...

Severity: -- → S3
Priority: -- → P3

A fix could be to ignore the current page when the scrolling hasn't been made by the user.

Hmm, that's not what happens with regular web content, though; we seem to just use DOM order, and scroll position doesn't matter. I'll post a testcase to illustrate.

(Maybe regular web content does behave like you're describing with certain sorts of markup, though -- not sure. If you think it does, maybe you could post a reduced testcase to demonstrate?)

(In reply to Daniel Holbert [:dholbert] from comment #6)

Hmm, that's not what happens with regular web content, though; we seem to just use DOM order, and scroll position doesn't matter. I'll post a testcase to illustrate.

Here's a testcase to illustrate this.

If you load this testcase and type Ctrl+F, and then type "Z", you'll jump all the way to the end to a "ZX" match (the first text node in the DOM with "Z" in it).

Then if you type "Z" again (to find "ZZ), Firefox scrolls back up to the top to show you the first text node in the DOM that has "ZZ". This is true regardless of whether the user has scrolled at all. Notably there's another text node at the bottom of the document with "ZZ", just below "ZX" -- but we don't advance immediately to that one when you've typed "ZZ". We first match against the "ZZ" at the top of the document (which does involve scrolling backwards).

I think the only thing that sets a special starting-point for find-in-page is the selection state, not the scroll position.

So e.g. if you visit https://en.wikipedia.org/wiki/Mozilla and scroll halfway down, and then do find-in-page for "Mozilla", then Firefox (and Chrome) snaps you back up to the very top of the document (to the first match). Whereas: if you repeat that process, and click-and-drag to select some text in the middle of the article before you start your find-in-page operation, then Firefox starts the find operation from that point (the start of the selected text).

Maybe PDF.js is creating a selection of some sort during its find-in-page operation, which is triggering this special find-in-page behavior where it skips ahead to a nondefault starting point?

It was working as you expect but it has been changed in:
https://github.com/mozilla/pdf.js/pull/3941
In Chrome and Edge, they don't use the current page for the next match.
Interestingly, Acrobat uses the current page but the match order corresponds to the visual order so they don't take into account the position in the content stream.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: