PDF Viewer does not highlight the searched string correctly on a PDF file
Categories
(Firefox :: PDF Viewer, defect, P3)
Tracking
()
People
(Reporter: ishikawa, Unassigned)
Details
Attachments
(11 files)
|
533.06 KB,
application/pdf
|
Details | |
|
524.33 KB,
image/png
|
Details | |
|
537.10 KB,
image/png
|
Details | |
|
555.83 KB,
image/png
|
Details | |
|
247.54 KB,
image/png
|
Details | |
|
1.19 MB,
image/png
|
Details | |
|
248.57 KB,
image/png
|
Details | |
|
613.84 KB,
image/png
|
Details | |
|
217.07 KB,
image/png
|
Details | |
|
310.42 KB,
application/pdf
|
Details | |
|
61.47 KB,
image/png
|
Details |
Attach (recommended) or Link to PDF file here:
https://www.cisa.gov/uscert/sites/default/files/ICSJWG-Archive/QNL_SEP_20/Protecting%20Embedded%20Systems%20-%20Verve%20Industrial_S508NC.pdf
I am attaching it.
Steps to reproduce the problem:
- Try searching for "configuration"
- Look at the highlighted searched position.
You can see it is off a few characters.
If I search "OT", the result is dismal. I cannot even co-relate where the PDF viewer thinks
it has found the string "OT".
What is the expected behavior? (add screenshot)
What went wrong? (add screenshot)
| Reporter | ||
Comment 1•3 years ago
|
||
I searched for "configuration" and the highlighted position in the attached screen dump shows that
- it is at the incorrect position, and
- the extent of the highlight seems much shorter than "configuration".
| Reporter | ||
Updated•3 years ago
|
| Reporter | ||
Comment 2•3 years ago
|
||
I searched for "OT", but the highlighted position is way off. It seems it usually highlights a few characters before the string "ot" is found.
| Reporter | ||
Updated•3 years ago
|
| Reporter | ||
Comment 3•3 years ago
|
||
Actually, I was looking for "OT" the capital abbreviation in this document, but since the matched highlighted position was completely bogus I thought PDF viewer was not finding it and producing garbage, but I think its idea of found string position is incorrect instead.
| Reporter | ||
Comment 4•3 years ago
|
||
I gave up on PDF viewer and downloaded the PDF and used adobe's acrobat reader.
To my surprise, when I searched for "OT", the acrobat reader HIGHLIGHTED WORDS that contain the string "OT" and not the OT portion itself!
This is a strange PDF.
| Reporter | ||
Comment 5•3 years ago
|
||
When I searched for "configuration" using acrobat reader, it highlighted both words,
configuration
and
configurations
I did not know we can create a PDF like this.
Comment 6•3 years ago
|
||
The severity field is not set for this bug.
:calixte, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 7•3 years ago
|
||
I'd say it's because of the font used in the text layer.
Could you screenshot: https://bug1808983.bmoattachments.org/attachment.cgi?id=9311134#page=14&textLayer=visible and attach it to the bug ?
| Reporter | ||
Comment 8•3 years ago
|
||
(In reply to Calixte Denizet (:calixte) from comment #7)
I'd say it's because of the font used in the text layer.
Could you screenshot: https://bug1808983.bmoattachments.org/attachment.cgi?id=9311134#page=14&textLayer=visible and attach it to the bug ?
Screenshot?
From within firefox?
This is it.
| Reporter | ||
Comment 9•3 years ago
•
|
||
EDIT: correction. More correction.
Behavior of looking at the PDF from within firefox may change depending on where I see the PDF?
No, the outlook is the same. It is just that firefox sometimes show the correct highlighting whereas it may show incorrect highlighting in other places.
I do get the incorrect highlighting still when I look at the
PDF posted in the attachment here.
https://bugzilla.mozilla.org/attachment.cgi?id=9311134
Via the URL above, I look at the PDF and search for "ot" and get incorrect highlighting, BUT sometimes firefox get it right in other places.
Ah, the previous post was useless because it only contained the cover page (?)
But I found out something. The original comment was posted using Firefox 108.
Now, I am using firefox 109.0 and the problem is not noticed any more(!?) <--- INow wonder where and how the problem was not noticed.
But please recall that I also experienced a very strange search result using Adobe Acrobat reader (comment 4).
I just checked and I still see a whole word highlighting of the word that contains "ot" when I searched for "ot" in adobe acrobat reader.
| Reporter | ||
Comment 10•3 years ago
|
||
There are correct and incorrect highlighting int the same PDF (in the attachment).
I searched for "ot" using firefox PDF viewer.
Comment 11•3 years ago
|
||
Should be something weird with the document content source.
I agree with the below discussion ; Per https://superuser.com/a/561603:
- It may have a custom font encoding that assigns code points to characters in a way that is incompatible with established encodings such as ASCII or UTF-8/Unicode.
- It may render characters individually out of sequence
- It may have had characters flattened to path
If you able to share the PDF used for testing and still seeing the issue , we may validate the PDF encoding and how it behave with the FireFox search feature.
Searching in FF111 PDF is yielding expected results
| Reporter | ||
Comment 12•3 years ago
|
||
(In reply to kesavan from comment #11)
Should be something weird with the document content source.
I agree with the below discussion ; Per https://superuser.com/a/561603:
- It may have a custom font encoding that assigns code points to characters in a way that is incompatible with established encodings such as ASCII or UTF-8/Unicode.
- It may render characters individually out of sequence
- It may have had characters flattened to path
Possibly.
If you able to share the PDF used for testing and still seeing the issue , we may validate the PDF encoding and how it behave with the FireFox search feature.
The PDF is the first attachment in this bugzilla.
https://bug1808983.bmoattachments.org/attachment.cgi?id=9311134
Searching in FF111 PDF is yielding expected results
Are you sure?
With the PDF (the first attachment in this bugzilla), and FF111.0 under Windows10, I still get the very strange matched results as in comment 2.
Just search "OT", there are many strange highlighted positions.
Comment 13•2 years ago
|
||
I can replicate this on other files. The font used in the phrase "In the Supreme Court of the United States" is used on most US Supreme Court briefs. Talking tens of thousands to hundreds of thousands of documents with this phrase and font. Using the "Text selection tool", highlight various individual letters in the line "In the Supreme Court of the United States"
https://www.supremecourt.gov/DocketPDF/22/22-915/275781/20230821132616436_22-915_Amicus%20Curiae%20Brief.pdf
Firefox is unable to properly highlight the letter boundaries.
Google Chrome and Adobe Acrobat Reader highlight the letter boundaries properly. Right now Firefox doesn't have chrome-parity re: this bug.
For example, here are some screenshots of what highlighting the letters "Su" on the first page of this pdf in Firefox, Chrome, and Adobe Acrobat Reader.
Comment 14•2 years ago
|
||
Comment 15•2 years ago
|
||
Comment 16•2 years ago
|
||
Microsoft's Edge browser also renders this as expected. All 4 programs render the selection of the characters differently, but only Firefox doesn't do it correctly.
| Reporter | ||
Comment 17•2 years ago
|
||
(In reply to George from comment #16)
Created attachment 9355473 [details]
Su in Edge.pngMicrosoft's Edge browser also renders this as expected. All 4 programs render the selection of the characters differently, but only Firefox doesn't do it correctly.
Hi, the original reporter here.
" All 4 programs render the selection of the characters differently": you mean the color and height of the marked region?
Anyway, if other programs can do the selection correctly, there must be internal information in the PDF file to do the selection correctly.
Firefox ought to use THAT information.
When the basic function such as selected area is not highlighted correctly, it disrupts the user workflow very much.
I hope someone can work on this.
Description
•