Closed
Bug 844006
Opened 11 years ago
Closed 11 years ago
Copying text from PDF document inserts random whitespace or newline.
Categories
(Firefox :: PDF Viewer, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 810636
People
(Reporter: ishikawa, Unassigned)
Details
When I copy a text from in-line PDF viewer under windows (FF v 19.0 automagically enabled for me when FF updated itself from FF18.), and paste it to other text processing program, I get random whitespace or newlines at unexpected places. For example, http://www.education.gov.yk.ca/pdf/pdf-test.pdf (This is the first hit when I searched "pdf test" using google.) If I copy the first paragraph in this PDF and paste it into memo pad, somehow the word computer is split "comput" and "er" with a line-break in between. With the following Japanese PDF (from Information Processing Society of Japan), if I copy and paste the first paragraph starting (1) (at about five lines in the main text), voila! Each and every character is on one line: each line is pasted as a single character line: This is not what I expected. Copy&Past is useless in this case. Funny, another Japanese PDF in the following (a government leaflet) allows copy and paste work as expected. http://www.meti.go.jp/statistics/toppage/topics/pamphlet/pdf/h21shokai.pdf If you copy the title page's remark in the central light-colored round-corner rectangle, copy and paste works as expected. I tested a paragraph in a few pages later, and copy and paste works as expected again. Maybe there is a method of PDF creation that would allow PDF.js to operate copy and paste correctly, but we can't dictate how to create PDFs. So not all the PDFs on the Internet are created equal in this regard, and PDF.js needs to pay a little more attention to this issue IMHO. BTW, I tried copy and paste operations for the last two Japanese PDFs using Adobe acrobat 10.1.6 and they work as expected. But I was surprised to find that the Yukon education PDF test file (the first PDF) does not seem to allow copy operation (!). How interesting that PDF.js allows us to bypass the light-hearted copy-prohibition :-) Bug 429859 may be related to the issue at hand although it is an old issue. (copying text from some pages inserts garbage characters between the characters on the page) Back then it was reported NUL character was inserted after every character. TIA
Reporter | ||
Comment 1•11 years ago
|
||
My bad, I forgot to insert the IPSJ's URL for the second PDF. http://ime.nu/www.itscj.ipsj.or.jp/nenkan/nenkan08.pdf Someone reported that he/she could not read this page even (not displayed correctly), but at least on my PC with the said OS, it is readable. TIA
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•