find functionality creates terms in pdf viewer pages
Categories
(Firefox :: PDF Viewer, defect, P1)
Tracking
()
People
(Reporter: karlden, Assigned: calixte)
References
Details
(Whiteboard: [bugday-20140113][pdfjs-ux][pdfjs-text-search])
Attachments
(1 file)
238.02 KB,
image/png
|
Details |
Comment 1•11 years ago
|
||
Updated•11 years ago
|
Comment 5•4 years ago
|
||
7 years later, this bug is still present.
Using :
Firefox 79.0 on desktop
Windows 10
Here is a link to the original document using archive.org's Wayback Machine:
https://web.archive.org/web/20060818161558/http://archive.dovebid.com/brochure/bro1514.pdf
Using the PDF bar at the top (not using Firefox Menubar -> Edit -> Find in This Page)
"WorkBenches" triggers a find for Work [wrap to new line, then separate word] Benches.
"WorkBenches" with the search bar option "Whole words" ticked on does not yield any results.
Copy+Pasting from the PDF in Firefox -> to a text editor gives the following text :
"BENCH-TOP ARBOR PRESSES, (2) FAMCO NO. 2, (2) SHELDON NO. 2Pedestal Fans, Portable Heaters, Washers, Sweepers, Shop Vacs, WorkBenches, Hand Tools, Vises, Supply Cabinets, Flammable Safety StorageCabinets, Tool Storage Cabinets, Extension & Safety Ladders, Carts, PalletJacks, Work Fixtures, Horses, Stands, Etc."
Not familiar enough with PDFs or Firefox's rendering to posit any explanations, but hopefully this will be useful (someday) for someone to continue diagnosing the issue.
Have a wonderful day.
Assignee | ||
Updated•3 years ago
|
Comment 7•3 years ago
|
||
https://github.com/mozilla/pdf.js/pull/13261 should fix this.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 8•3 years ago
|
||
The "Work Benches" bug from the first PDF is fixed. "aisle Twin" bug from the second PDF is fixed too. "aisle 226" from the second PDF is not fixed (though I'm not sure what's the correct behavior).
Calixte, could you check with Adobe Reader?
Comment 9•3 years ago
|
||
"aisle Cross Section 226" finds a result ("aisle" and "226" are in the 7th page on two following lines, "Cross Section" is in the 6th page)
Comment 10•3 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #8)
"aisle 226" from the second PDF is not fixed (though I'm not sure what's the correct behavior).
Interestingly that can also be reproduced in PDFium (in Google Chrome), and it seems that the problem isn't related to the search functionality as such but rather to the actual contents of the textLayer.
From a very cursory look, it appears that some of the textContent is being position (by the PDF document itself) in such a way that it ends up outside of the visible pages. This is apparently affecting both PDF.js and PDFium, but not Adobe Reader as far as I can tell.
Assignee | ||
Comment 11•3 years ago
|
||
I think we should skip strings which are not in the page bounding box when we're creating text chunks.
Assignee | ||
Comment 12•3 years ago
|
||
I filed a bug about https://bugzilla.mozilla.org/show_bug.cgi?id=916883#c9:
https://bugzilla.mozilla.org/show_bug.cgi?id=1755201
Description
•