Closed
Bug 810636
Opened 12 years ago
Closed 3 years ago
Poor copy & paste behavior with pdf.js
Categories
(Firefox :: PDF Viewer, defect, P2)
Firefox
PDF Viewer
Tracking
()
VERIFIED
FIXED
95 Branch
People
(Reporter: RyanVM, Unassigned)
References
()
Details
(Whiteboard: [pdfjs-c-ux][pdfjs-d-text-search][pdfjs-d-text-selection])
Attachments
(7 files)
Attached are a couple PDFs that are producing copy & paste behavior that doesn't seem to be correct.
Blah is a Word document I created myself and converted to PDF using PDFCreator. When I open it with Adobe Reader, I can copy & paste the text into a text editor exactly as it shows. When I open it with pdf.js, the third line pastes with an extra space at the end of it. To summarize:
Expected:
Blah
Blah
Blah
Actual:
Blah
Blah
Blah <-- Extra space
Actiontec is a PDF from the manufacturer of my wireless router. It shows much worse performance when trying to copy & paste "Verizon FiOS Router" at the top, there are a couple issues. First, it is very difficult to select just that text without it trying to select much of the body text as well. Second, it seems to miss the last letter of each line. Third, it inserts a new paragraph after each character. Adobe Reader is able to copy & paste the text fine.
Expected:
Verizon
FiOS
Router
Actual:
V
e
r
i
z
o
n
F
i
O
S
R
o
u
t
e
Reporter | ||
Comment 1•12 years ago
|
||
Reporter | ||
Comment 2•12 years ago
|
||
Updated•12 years ago
|
Priority: -- → P2
Whiteboard: [pdfjs-c-ux][pdfjs-d-text-search][pdfjs-d-text-selection]
Comment 3•12 years ago
|
||
Build identifier: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20130103 Firefox/19.0
Reproducible on Linux (Ubuntu 12.10), as well.
OS: Windows 7 → All
Comment 4•12 years ago
|
||
I think pdf.js should use span elements rather than div elements to create dummy layers for selection.
As of 2013-1-11 as reported the 829686 "dupe" bug, it is breaking on words, not individual letters. Hopefully this is just progress or some quirk of the PDF and the fix for both scenarios is the same.
Comment 7•12 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #0)
> into a text editor exactly as it shows. When I open it with pdf.js, the
> third line pastes with an extra space at the end of it. To summarize:
Now every line has an extra trailing space.
Comment 8•12 years ago
|
||
ATM we use a lot of single divs that hold the text on the PDF page. There can be multiple divs per line or even per word. If you then select a line there is an extra newline inserted between each div. That's just how the spec goes ;)
To improve this, we need to compute the string to copy to the clipboard ourself (it's not that complicated as it maybe sounds as we have the layout data in PDF.JS already). This requires support for copy-event/cut-event clipboarData, which enables you to change the content copied/cut from a page (like need to do in this case here). The copy-event/cut-event clipboarData is very close to land in bug #407983. Once that's in, I will do the necessary bits in PDF.JS.
Comment 10•12 years ago
|
||
I am using FF 19, and now selecting text does not select all the text. It'll select most of it, but it seems that the last word on every line is cut off, as well as a few in the middle.
Flags: needinfo?
Comment 12•11 years ago
|
||
Well, I agree that copying selected out via clipboard results in incredibly dis-formatted mess, fixing it would be really great, right now it is almost not usable for regular text extraction... :-(
Comment 13•11 years ago
|
||
I have just seen this behaviour with this pdf file:
http://www.mairie-rochecorbon.fr/pdf/reglement.pdf
Hope a fix will come soon
Comment 14•11 years ago
|
||
I've run into this recently as well. Tested on both v24 and a several day old nightly.
http://alstomsignalingsolutions.com/Data/Documents/VCS_April_16_2013.pdf
See attached screen shot. A copy of the selected region resulted
in the following text in the clipboard.
"nt sensing require"
It should have been:
"sensing requirements of the"
Though it's hard to be sure because the displayed selection box is a little
sloppy.
Comment 15•11 years ago
|
||
Just to be clear, the warning from NoScript is because I have Javascript from the website disabled at that point, pdf.js is enabled. I later enabled the website's JS and it had no effect.
Comment 16•11 years ago
|
||
Oh, and I'm on a Mac running OS X 10.7.5.
Comment 17•11 years ago
|
||
Here's another example from a current version of Firefox 27.0.1 on Mac OSX 10.9.2
Using the PDF viewer built into Firefox on this document:
http://www.institutional-economics.com/images/uploads/randreview.pdf
An attempt to highlight and then copy the text shown in the attached image, produced the errant highlight
shown in that image and resulted in the following truncated text in the clipboard
Tyler Cowen’s recent book,
Create Your Own Econ
A cut/paste from the same document viewed in Preview did the right thing. The highlight in the view and the text in the clipboard were correct.
Comment 18•11 years ago
|
||
Comment 19•11 years ago
|
||
Are there any plans to fix this issue in the near future?
Comment 20•10 years ago
|
||
I hope this gets fixed one day, it's really oannoying to be forced to download and open this file in adobe reader just to copy text... (which is also not very good since Adobe Reader is exposed to lots of attacks because of exploits)
Comment 21•10 years ago
|
||
at least it keeps the formatting unlike Adobe Reader (fonts and stuff) if you copy it into Word (Office)
Comment 22•10 years ago
|
||
Any updates regarding this bug ?
Comment 23•10 years ago
|
||
Yeah, I hope this is not expected behaviour :/
An updated on the current state would be nice. Has anybody interest in providing a patch, maybe? (I would like to but I don't have any knowledge about this at all)
Especially since https://bugzilla.mozilla.org/show_bug.cgi?id=407983 is now integrated already :)
Comment 24•9 years ago
|
||
There's a thread started at mozillaZine recently regarding this: http://forums.mozillazine.org/viewtopic.php?f=38&t=2961299
Firefox 40.0.3 in Win 7 Pro also has wacky issues copying and pasting from PDFs using the built-in viewer as well.
Comment 26•6 years ago
|
||
No assignee, updating the status.
Comment 27•6 years ago
|
||
No assignee, updating the status.
Comment 28•6 years ago
|
||
When trying to copy text:
The in-and-out breaths, are bodily formation.
Thinking and pondering are verbal formation.
Perception and feeling are thought formation.
from
http://www.themindingcentre.org/dharmafarer/wp-content/uploads/2013/04/40a.9-Culavedalla-S-m44-piya.pdf
produces
The in
-
and
-
out breath
s
,
are
bodily formation.
120
Thinking and
ponder
ing
are
verbal formation.
121
Perception
and
feeling
are
thought formation
.‖
12
Comment 29•5 years ago
|
||
noise |
"Opened 7 years ago".
This is the kind of thing that gets people to switch to Chrome.
Comment 30•3 years ago
|
||
It should be fixed thanks to https://github.com/mozilla/pdf.js/pull/13424.
pdf.js has been updated in m-c (see bug 1737299) so the fix is available in nightly.
Reporter | ||
Updated•3 years ago
|
status-firefox95:
--- → fixed
status-firefox-esr91:
--- → fixed
Depends on: 1748536
Target Milestone: --- → 95 Branch
Comment 31•3 years ago
|
||
I have reproduced this issue in ESR v91.4.1esr and verified the fix in ESR v91.5.0esr and Release v95.0.2 and Nightly v97.0a1 and Windows 10, Mac OS 11.6.2 and Ubuntu 20.04.3 LTS.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•