PDF search should default to unlimited whitespace
Categories
(Firefox :: PDF Viewer, enhancement, P1)
Tracking
()
People
(Reporter: Nick_Levinson, Assigned: calixte)
References
()
Details
(Whiteboard: [pdfjs-text-search])
PDFs often include additional spaces between spaced-apart words, even though the additional spaces are not displayed and even without a line break intervening. This affects the Firefox search function. So, when I searched the document at the URL for "foreign law" (with single space) (all search terms are without quote marks), the results did not include any for "foreign law" (with double-space). To see both strings on one page, see p. 459 (p. 471 per PDF reader)).
Because the double-space is displayed as a single space by the PDF reader in Firefox, the only way to expose the extra space is to copy the string that did not match the search string from the PDF into a text editor or other appropriate app.
PDFs often have hidden additional spaces. This may happen often in justified text, but I don't know if it also happens in unjustified text. I don't know if kerning or tracking, forward or reverse, affects this issue, too. In this URL, where most text is justified, the normal single-spaced string occurs 28 times but the unexpected double-spaced version occurs 55 times, meaning that if I hadn't noticed the problem I would have missed about two thirds of the desired matches.
I don't know if PDFs, which are now made in multiple competing apps, use whitespace other than ordinary breaking spaces in this context.
Solution: The search function should continue to execute literally. Therefore, I suggest adding an option to the search bar to allow additional spaces, perhaps unlimited whitespace of any kind.
If the search string is only for spacing, the option would not appear or would be dimmed. The option would not apply to leading or trailing spacing. In both cases, the matches would be found anyway.
The option, if selected, would accept any number of consecutive spaces wherever a space occurs within the search string. Thus, for example, if the search is for "sardine doughnut juice", the following would be found in the same search:
sardine doughnut juice
sardine doughnut juice (with 6 extra spaces after "doughnut")
sardine doughnut juice (with 4 and 3 extra spaces, respectively)
If the option is selected, the number of spaces supplied by the user should be the minimum number of spaces when the search is executed. For example, if the search is for "train bus plane" (a single space and then a double-space), results would include "train bus plane" (triple- and double-space, thus more than the search string) but would not include "train bus plane" (single-spaces only, thus less than the search string).
The FF versiosn is 78.0.1 (64-bit).
(The URL was as accessed 5-16-20 & 7-5-20. If this Bugzilla or your browser is not showing consecutive spaces in the above, I typed them.)
Updated•3 years ago
|
Comment 2•3 years ago
|
||
Find in text is fixed. The highlight is a bit off on page 471, but it is acceptable.
| Assignee | ||
Updated•3 years ago
|
Description
•